What are People Talking About on Twitter? A Research Perspective

In: Tech Tips

29 Nov 2009

Twitter, the hot hub is buzzing around with real time discussions on a wide array of topics. It’s an intriguing site for people to discuss and share thoughts. The word limit is the catch: 140 characters. This gives a lot of scope for researchers to study and analyze tweets to gain conclusions about what people are talking about, what kind of emotions are being used, and what factors primarily contribute to changes in conversations.

twitter

Let me present you with certain intriguing semantic tools and a research perspective - which is just the tip of the iceberg w.r.t twitter data.

OpenCalais

OpenCalais is a toolkit that allow you to incorporate state-of-the-art semantic functionality within your blog, content management system, website or application.

A brief walk through,

OpenCalais can be used to extract real world entities from a corpus of text. It attaches rich semantic metadata to the content you submit. Calais categorizes and links your document with entities (people, places, organizations, etc.), facts (person “x” works for company “y”), and events (person “z” was appointed chairman of company “y” on date “x”).

For instance, I’m processing a chunk of text from an article on CNN via OpenCalais.

opencalais

As you can see from the above image, OpenCalais returns real world entities from a wide array of categories. They are highlighted in different colors based on the category. The entities extracted include, organization and person names, services, books, movies, events, social tags, tv shows, city, country names etc.,

You can also visit the OpenCalais showcase for samples to see how developers have implemented OpenCalais in different ways.

calais

Similar to the above test case we used, Twitter data (tweets) on a specific topic can be extracted and processed via OpenCalais to get the real world events, names, social tags, place names etc., termed as semantic meta data. This data helps researchers in analyzing what words are being used by people. Further conclusions like word frequency, repetitive usage of specific words, word categories used can be analyzed. This will help us in analyzing what people are talking about, what word combinations are being used in tweets and what factors contribute to changes in conversations. The changes can include emotional charge, sentiment, number of people joining, re-tweets etc.,

LIWC is a similar tool to analyze content. But, it extracts word categories used in the content like pronouns, nouns, verbs, insight, work, religion etc., You might want to check out the whole list of categories extracted by LIWC here. In short LIWC can be used to extract syntactic meta data from a chunk of text. So, in our case, the tweets processed via LIWC will return the proportions of word categories used. Following is a sample LIWC output.

liwc-word-categories

Thus we have both syntactic and semantic meta data to analyze tweets (words in general). Now, let me walk you through a roadmap on how a corpus of tweets can be studied to analyze what word combinations are being used together.

roadmap

Briefly, OpenCalais and LIWC are two tools to analyze semantic and syntactic meta data from a chunk of text. Thus, from these two tools we have the content in tweets mapped in word categories. This data can be processed via statistical methods llike factor analysis to study and analyze which of these are primarily influencing the conversation (volume of tweets).

Test Case

I extracted tweets on hot trending topics like Barack Obama, Swine Flu and Iran Elections to study and analyze what factors primarily influence changes in conversations on Twitter. Following is a sample output of factor interpretation from the LIWC output on the topic - Barack Obama. As you can see from the following table, under the factor D1 - the word categories pronouns, personal pronouns, auxillary words, present and words pertaining to i like me, myself etc., were used in combination when compared to other word categories. And, words pertaining to positive and negative emotions were used more in combination in factor D2.

liwc-event-obama-factor-interpretation

This post is to present a brief research perspective of Twitter. And to introduce the possibilities of analyzing data in a wide array of topics.

Blog Widget by LinkWithin

Comment Form

Follow Us on Twitter

About this blog

Hello, Welcome to Tech Inspiration. Here, we try to keep you up-to-date with Technology updates, Web 2.0, Software reviews, gadgets and more!; We hope you will love it and visit us more often. You can follow our feed here. If you have any questions, feel free to contact us. Hope you enjoy Tech Inspiration and take it easy.

-->

Subscribe to Our Newsletter and Get Updates via Email

Enter your email address: