Reputation: 539
I'm using TextBlob for python to do some sentiment analysis on tweets. The default analyzer in TextBlob is the PatternAnalyzer which works resonably well and is appreciably fast.
sent = TextBlob(tweet.decode('utf-8')).sentiment
I have now tried to switch to the NaiveBayesAnalyzer and found the runtime to be impractical for my needs. (Approaching 5 seconds per tweet.)
sent = TextBlob(tweet.decode('utf-8'), analyzer=NaiveBayesAnalyzer()).sentiment
I have used the scikit learn implementation of the Naive Bayes Classifier before and did not find it to be this slow, so I'm wondering if I'm using it right in this case.
I am assuming the analyzer is pretrained, at least the documentation states "Naive Bayes analyzer that is trained on a dataset of movie reviews." But then it also has a function train() which is described as "Train the Naive Bayes classifier on the movie review corpus." Does it internally train the analyzer before each run? I hope not.
Does anyone know of a way to speed this up?
Upvotes: 11
Views: 6925
Reputation: 486
In Addition to the above solutions, I tried the above solutions and I faced errors, so if someone found this question, I solved the errors as well as tried using PatternAnalyzer the in below code:
from textblob import Blobber
from textblob.sentiments import NaiveBayesAnalyzer, PatternAnalyzer
import nltk
nltk.download('punkt')
nltk.download('movie_reviews')
tb = Blobber(analyzer=NaiveBayesAnalyzer())
tb1 = Blobber(analyzer=PatternAnalyzer())
print(tb("sentence you want to test").sentiment)
print(tb1("sentence you want to test").sentiment)
print(tb("I love the book").sentiment)
print(tb1("I love the book").sentiment)
Upvotes: 0
Reputation: 1731
Adding to Alan's very useful answer if you have table data in a dataframe and want to use textblob's NaiveBayesAnalyzer then this works. Just change out word_list
for your relevant series of strings.
import textblob
import pandas as pd
tb = textblob.Blobber(analyzer=NaiveBayesAnalyzer())
for index, row in df.iterrows():
sent = tb(row['word_list']).sentiment
df.loc[index, 'classification'] = sent[0]
df.loc[index, 'p_pos'] = sent[1]
df.loc[index, 'p_neg'] = sent[2]
Above splits the tuple that sentiment
returns into three separate series.
This works if the series is all strings but if it has mixed datatypes, as can be a problem in pandas with the object
datatype then you might want to put a try/except block around it to catch exceptions.
On time it is doing 1000 rows in around 4.7 seconds in my tests.
Hope this is helpful.
Upvotes: 0
Reputation: 288
Yes, Textblob will train the analyzer before each run. You can use following code to avoid train the analyzer everytime.
from textblob import Blobber
from textblob.sentiments import NaiveBayesAnalyzer
tb = Blobber(analyzer=NaiveBayesAnalyzer())
print tb("sentence you want to test")
Upvotes: 17