Hadoop to analyze President Barack Obama’s State of the Union Address Speeches

Hadoop_Obama

Idea

The State of the Union is the address presented by the President of the United States to a joint session of the United States Congress, typically delivered annually. The goal of this project is to use Hadoop’s map reduce programming model to analyze all the speeches delivered by the current president of united states Mr. Barack Obama and find out the most commonly used words while filtering for some common stop words. And present the results using histogram or similar graph.

Data/References

Speech Data

http://www.whitehouse.gov

Stop Words

https://code.google.com/p/stop-words/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s