jest3r
jest3r

Reputation: 73

PoS Implementation with Naive Bayes Sentiment Analysis

I am trying to apply Sentiment Analysis (predicting negative and positive tweets) on a relatively large Dataset (10000 rows). So far, I achieved only ~73% accuracy using Naive Bayes and my method called "final" shown below to extract features. I want to add PoS to help with the classification, but am completely unsure how to implement it. I tried writing a simple function called "pos" (which I posted below) and attempted using the tags on my cleaned dataset as features, but only got around 52% accuracy this way.. Can anyone lead me in the right direction to implement PoS for my model? Thank you.

def pos(word):
 return [t for w, t in nltk.pos_tag(word)]


def final(text):

   """
   I have code here to remove URLs,hashtags, 
   stopwords,usernames,numerals, and punctuation.
   """

   #lemmatization
   finished = []
   for x in clean:
      finished.append(lem.lemmatize(x))

   return finished

Upvotes: 1

Views: 345

Answers (1)

0x5050
0x5050

Reputation: 1231

You should first split the tweets into sentences and then tokenize. NLTK provides a method for this.

   from nltk.tokenize import sent_tokenize
   sents = sent_tokenize(tweet)

After this supply this list of sentences to your nltk.pos_tag method. That should give accurates POS tags.

Upvotes: 1

Related Questions