Aigerim Sadir
Aigerim Sadir

Reputation: 463

How to get subjectivity score of the text in NLTK?

I need a method in NLTK that calculates the score (real number) of the text subjectivity. Is there anything like that in NLTK?

some_magic_method(my_text):
    ...

# 0.34

Upvotes: 2

Views: 1393

Answers (2)

Pie-ton
Pie-ton

Reputation: 562

A short answer is "No." At the moment, there is no method in NLTK that yields a numeric value for subjectivity. The only package which reports a numeric value for subjectivity is TextBlob.

That said, the module nltk.sentiment.util.demo_sent_subjectivity() reports subjectivity using a Dataset developed by Pang and Lee (2004) containing 5000 subjective and 5000 objective processed movie reviews. As I said, unlike textblob, this module only identifies statements (or bag of words) as either subjective or objective and does not assign a numeric value to them.

While the default classifier is not mentioned explicitly, I "think" this module uses a naive Bayesian classifier, which can be changed. You can find the documentation of this module here. Also, here is one example provided by the NLTK.

Upvotes: 1

Milo Knell
Milo Knell

Reputation: 164

A simple Google search yields https://www.nltk.org/api/nltk.sentiment.html which has a subjectivity predictor. It is in the context of sentiment, if you are looking from something divorced from that you could look at the Pang and Lee 2004 dataset. Using a simple count vectorized SVM I got 90% accuracy on it. Here is a snippet of code defining the class (from my GitHub), if you want the entire code I can supply more.

class ObjectivityDetector():
    '''SVM predicts the objectivity/subjectivity of a sentence. Trained on pang/lee 2004 with NER removal. Pre-grid searched and 5 fold validated and has a 90% accuracy and 0.89 F1 macro'''
    def __init__(self,train,model_file=None):
        self.pipeline = Pipeline(
            [
                ('vect', CountVectorizer()),
                ('tfidf', TfidfTransformer()),
                ('clf', CalibratedClassifierCV( #calibrated CV wrapping SGD to get probability outputs
                        SGDClassifier(
                        loss='hinge',
                        penalty='l2',
                        alpha=1e-4,
                        max_iter=1000,
                        learning_rate='optimal',
                        tol=None,),
                    cv=5)),
            ]
        )
        self.train(train)

    def train(self,train):
        learner = self.pipeline.fit(train['text'],train['truth'])
        self.learner = learner

    def predict(self,test):
        predicted = self.learner.predict(test)
        probs = self.learner.predict_proba(test)
        certainty = certainty_(probs)
        return predicted,certainty

    def score(self,predicted,test):
        acc = accuracy_score(test['truth'].to_numpy(),predicted[0])*100
        f1 = f1_score(test['truth'].to_numpy(),predicted[0], average='macro')
        print("Accuracy: {}\nMacro F1-score: {}".format(acc, f1))
        return acc,f1

Upvotes: 0

Related Questions