Anubhav Agarwal
Anubhav Agarwal

Reputation: 2062

Classify tweets into categories

I wish to write a script that will parse a users tweets and classify it into previously specified category. For example:

"Ed Miliband will lose election if he is 'seduced' by Blairites, says union chief http://bit.ly/145CRAD"

will classify in domain Politics.

"Dear Sachin, you're 40. Buy a sports car, have flings with 20 yr old blondes. Enjoy your midlife crisis. Leave IPL for the boys - your fan"

will classify in domain Cricket.

What is the best way to do this?

Upvotes: 1

Views: 3064

Answers (4)

Jaspn Wjbian
Jaspn Wjbian

Reputation: 279

How about lda? topic model!

you can try online-lda in python

http://www.cs.princeton.edu/~blei/topicmodeling.html

then if you want try distributed lda(more fast)

you can try light-lda

Upvotes: 0

Diego
Diego

Reputation: 18349

This is a complex problem in the field of Natural Language Processing (NLP) called document classification. One of the best open source libraries out there is maintained by The Stanford NLP Group. Good luck!

Upvotes: 1

Ian Mercer
Ian Mercer

Reputation: 39277

You are looking for a 'Topic Model'. Techniques include Latent Dirichlet Allocation and others. The Wikipedia article includes links to resources such as Mallet which should help you.

You didn't specify what language you wanted to use nor what 'best' means? Easiest to implement, fastest, or best results?

Another alternative is to use humans (e.g. Amazon Mechanical Turk) which may give you the 'best' results for tweets which are notoriously hard to classify given all the abbreviations, sarcasm, and hash tags ... #notAnEasyProblem.

Upvotes: 4

miguelmalvarez
miguelmalvarez

Reputation: 930

  1. Topic categorisation (traditional classification techniques)
  2. Entity Extraction and more complicated techniques to identify topics related to people or tweet accounts for instance.

This papers would be a nice point to start looking... http://dl.acm.org/citation.cfm?id=1835643 http://www.tmrfindia.org/ijcsa/v9i15.pdf

Upvotes: 1

Related Questions