How to predict Sentiments after training and testing the model by using NLTK NaiveBayesClassifier in Python?

Question

I am doing sentiment classification using NLTK NaiveBayesClassifier. I trained and test the model with the labeled data. Now I want to predict sentiments of the data that is not labeled. However, I run into the error. The line that is giving error is :

score_1 = analyzer.evaluate(list(zip(new_data['Articles'])))

The error is :

ValueError: not enough values to unpack (expected 2, got 1)

Below is the code:

import random
import pandas as pd
data = pd.read_csv("label data for testing .csv", header=0)
sentiment_data = list(zip(data['Articles'], data['Sentiment']))
random.shuffle(sentiment_data)
new_data = pd.read_csv("Japan Data.csv", header=0)
train_x, train_y = zip(*sentiment_data[:350])
test_x, test_y = zip(*sentiment_data[350:])

from unidecode import unidecode
from nltk import word_tokenize
from nltk.classify import NaiveBayesClassifier
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import extract_unigram_feats

TRAINING_COUNT = 350


def clean_text(text):
    text = text.replace("
", " ")

    return text


analyzer = SentimentAnalyzer()
vocabulary = analyzer.all_words([(word_tokenize(unidecode(clean_text(instance))))
                                 for instance in train_x[:TRAINING_COUNT]])
print("Vocabulary: ", len(vocabulary))

print("Computing Unigran Features ...")

unigram_features = analyzer.unigram_word_feats(vocabulary, min_freq=10)

print("Unigram Features: ", len(unigram_features))

analyzer.add_feat_extractor(extract_unigram_feats, unigrams=unigram_features)

# Build the training set
_train_X = analyzer.apply_features([(word_tokenize(unidecode(clean_text(instance))))
                                    for instance in train_x[:TRAINING_COUNT]], labeled=False)

# Build the test set
_test_X = analyzer.apply_features([(word_tokenize(unidecode(clean_text(instance))))
                                   for instance in test_x], labeled=False)

trainer = NaiveBayesClassifier.train
classifier = analyzer.train(trainer, zip(_train_X, train_y[:TRAINING_COUNT]))

score = analyzer.evaluate(list(zip(_test_X, test_y)))
print("Accuracy: ", score['Accuracy'])

score_1 = analyzer.evaluate(list(zip(new_data['Articles'])))
print(score_1)

I understand that the problem is arising because I have to give two parameters is the line which is giving an error but I don't know how to do this.

Thanks in Advance.

Osvald Laurits · Accepted Answer

Documentation and example

The line that gives you the error calls the method SentimentAnalyzer.evaluate(...) . This method does the following.

Evaluate and print classifier performance on the test set.

See SentimentAnalyzer.evaluate.

The method has one mandatory parameter: test_set .

test_set – A list of (tokens, label) tuples to use as gold set.

In the example at http://www.nltk.org/howto/sentiment.html test_set has the following structure:

[({'contains(,)': False, 'contains(.)': True, 'contains(and)': False, 'contains(the)': True}, 'subj'), ({'contains(,)': True, 'contains(.)': True, 'contains(and)': False, 'contains(the)': True}, 'subj'), ...]

Here is a symbolic representation of the structure.

[(dictionary,label), ... , (dictionary,label)]

Error in your code

You are passing

list(zip(new_data['Articles']))

to SentimentAnalyzer.evaluate. I assume your getting the error because

list(zip(new_data['Articles']))

does not create a list of (tokens, label) tuples. You can check that by creating a variable which contains the list and printing it or looking at the value of the variable while debugging. E.G.

test_set = list(zip(new_data['Articles']))
print("begin test_set")
print(test_set)
print("end test_set")

You are calling evaluate correctly 3 lines above the one that is giving the error.

score = analyzer.evaluate(list(zip(_test_X, test_y)))

I guess you want to call SentimentAnalyzer.classify(instance) to predict unlabeled data. See SentimentAnalyzer.classify.

How to predict Sentiments after training and testing the model by using NLTK NaiveBayesClassifier in Python?

Answers (1)

Related Questions