This project seeks to build a co-occurrence network based on n-gram data provided by Google Inc.This project presents an easy and fast way to analyze Google n-gram data, which is contributed by Google Inc. Google n-gram data consists of a huge amount of word information based on real life searching queries entered by internet users. The huge amount of data makes it so hard to analyze the whole data set. In this project, we present a possible parallel solution to build and access co-occurrence network using Google n-gram data. Moreover, we use the co-occurrence network to find relationship (path) between words in this large corpus. We also build a common library based on C/MPI for all the similar co-occurrence network analysis programs. This method was tested on both Blade system and Altix system from MSI at University of Minnesota Twin City campus..
Anurag Jain <jainx086 at d dot umn dot edu>
Bin Lan <lanxx019 at d dot umn dot edu>
Darshan Paranjape <para0101 at d dot umn dot edu>
Vishnu Praveen Pedireddi <pedir001 at d dot umn dot edu>
FLAMENGO PROJECT REPORT
FLAMENGO PROJECT PRESENTATION