Kumar
Kumar

Reputation: 1017

simple nltk sentiment analysis code using python3

I am trying to do some classification on customer emails.

  1. Is the email happy or sad (sentiment analysis)
  2. Is the email related to billing or not.

I am using Python3 and think I have to use nltk and scikit NLTK - will help understand and read the text I beleive scikit - will do the classification (happy, sad and billing or not)

Training data set 1: A few phrases...anywhere from one word to a sentence with 5 to 6 words. (1 being happy and 0 being not happy)...a few examples below

Training data set 2: a few phrases indicating billing related question..(few examples below)

Now this seems to be straight forward from a concept stand point where can I find some basic code, that will tell me

  1. how I can use my own training data
  2. how I can load the email text as input and spit out an answer happy or sad...and billing or not.

Upvotes: 0

Views: 1528

Answers (1)

clemtoy
clemtoy

Reputation: 1731

Regarding your data sets, your approach is nearly lexicon-based as the items contains very few words.

For billing, the lexicon-based approach should be a good idea. You should give importance to the subjects of the emails.

For sentiment analysis you have two options:

  • Machine learning: In this case you should use a bigger data set (in my view, each item should be a full email). You can implement a Naive Bayes classifier following this tutorial.

  • Lexicon-based approach: There are several lexicons for sentiment analysis e.g. SentiWordNet (downloadable from nltk.download()), MPQA, SentiStrength, WordNet-Affect via WNAffect,... Preprocessings: tokenization (nltk.word_tokenize()) and POS tagging (nltk.pos_tag(text)). You should also think about negation (polarity shifting is a good approach to manage with negation).

Machine Learning provide best results so if you have enough annotated emails it is the good choice.

Upvotes: 3

Related Questions