Reputation: 4699
The built-in classifier in textblob is pretty dumb. It's trained on movie reviews, so I created a huge set of examples in my context (57,000 stories, categorized as positive or negative) and then trained it using nltk.
I tried using textblob to train it but it always failed:
with open('train.json', 'r') as fp:
cl = NaiveBayesClassifier(fp, format="json")
That would run for hours and end in a memory error.
I looked at the source and found it was just using nltk and wrapping that, so I used that instead, and it worked.
The structure for nltk training set needed to be a list of tuples, with the first part was a Counter of words in the text and frequency of appearance. The second part of tuple was 'pos' or 'neg' for sentiment.
>>> train_set = [(Counter(i["text"].split()),i["label"]) for i in data[200:]]
>>> test_set = [(Counter(i["text"].split()),i["label"]) for i in data[:200]] # withholding 200 examples for testing later
>>> cl = nltk.NaiveBayesClassifier.train(train_set) # <-- this is the same thing textblob was using
>>> print("Classifier accuracy percent:",(nltk.classify.accuracy(cl, test_set))*100)
('Classifier accuracy percent:', 66.5)
>>>>cl.show_most_informative_features(75)
Then I pickled it.
with open('storybayes.pickle','wb') as f:
pickle.dump(cl,f)
Now... I took this pickled file, and re opened it to get the nltk.classifier 'nltk.classify.naivebayes.NaiveBayesClassifier'> -- and tried to feed it into textblob. Instead of
from textblob.classifiers import NaiveBayesClassifier
blob = TextBlob("I love this library", analyzer=NaiveBayesAnalyzer())
I tried:
blob = TextBlob("I love this library", analyzer=myclassifier)
Traceback (most recent call last):
File "<pyshell#116>", line 1, in <module>
blob = TextBlob("I love this library", analyzer=cl4)
File "C:\python\lib\site-packages\textblob\blob.py", line 369, in __init__
parser, classifier)
File "C:\python\lib\site-packages\textblob\blob.py", line 323, in
_initialize_models
BaseSentimentAnalyzer, BaseBlob.analyzer)
File "C:\python\lib\site-packages\textblob\blob.py", line 305, in
_validated_param
.format(name=name, cls=base_class_name))
ValueError: analyzer must be an instance of BaseSentimentAnalyzer
what now? I looked at the source and both are classes, but not quite exactly the same.
Upvotes: 6
Views: 1667
Reputation: 4699
I wasn't able to be certain that a nltk corpus cannot work with textblob, and that would surprise me since textblob imports all of the nltk functions in its source code, and is basically a wrapper.
But what I did conclude after many hours of testing is that nltk offers a better built-in sentiment corpus called "vader"
that outperformed all of my trained models.
import nltk
nltk.download('vader_lexicon') # do this once: grab the trained model from the web
from nltk.sentiment.vader import SentimentIntensityAnalyzer
Analyzer = SentimentIntensityAnalyzer()
Analyzer.polarity_scores("I find your lack of faith disturbing.")
{'neg': 0.491, 'neu': 0.263, 'pos': 0.246, 'compound': -0.4215}
CONCLUSION: NEGATIVE
vader_lexicon
and nltk code does a lot more parsing of negation language in sentences in order to negate positive Words. Like when Darth Vader says "lack of faith" that changes the sentiment to its opposite.
I explained it here, with examples of the better results: https://chewychunks.wordpress.com/2018/06/19/sentiment-analysis-discovering-the-best-way-to-sort-positive-and-negative-feedback/
That replaces this textblob implementation:
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer
TextBlob("I find your lack of faith disturbing.", analyzer=NaiveBayesAnalyzer())
{'neg': 0.182, 'pos': 0.817, 'combined': 0.635}
CONCLUSION: POSITIVE
The vader nltk
classifier also has additional documentation here on using it for sentiment analysis: http://www.nltk.org/howto/sentiment.html
textBlob always crashed my computer with as little as 5000 examples.
Upvotes: 2
Reputation: 4699
Another more forward-looking solution is to use spaCy to build the model instead of textblob
or nltk
. This is new to me, but seems a lot easier to use and more powerful:
https://spacy.io/usage/spacy-101#section-lightning-tour
"spaCy is the Ruby of Rails of natural language processing."
import spacy
import random
nlp = spacy.load('en') # loads the trained starter model here
train_data = [("Uber blew through $1 million", {'entities': [(0, 4, 'ORG')]})] # better model stuff
with nlp.disable_pipes(*[pipe for pipe in nlp.pipe_names if pipe != 'ner']):
optimizer = nlp.begin_training()
for i in range(10):
random.shuffle(train_data)
for text, annotations in train_data:
nlp.update([text], [annotations], sgd=optimizer)
nlp.to_disk('/model')
Upvotes: 0
Reputation: 460
Going over the error message, it seems like the analyzer must be inherited from the abstract class BaseSentimentAnalyzer
. As mentioned in the docs here, this class must implement the analyze(text)
function. However, while checking the docs of NLTK's implementation, I could not find this method in it's main documentation here or its parent class ClassifierI
here. Hence, I believe both these implementations cannot be combined, unless you can implement a new analyze
function in NLTK's implementation to make it compatible with TextBlob's.
Upvotes: 0