Deb Prakash Chatterjee
Deb Prakash Chatterjee

Reputation: 113

How to predict all classes in a multi class Sentiment Analysis problem using SVM?

Well, I am making a sentiment analysis classifier and I have three classes/labels, positive, neutral and negative. The Shape of my training data is (14640, 15), where

negative    9178
neutral     3099
positive    2363

I have pre-processed the data to make it standardized and applied the bag-of-words word vectorization technique to the text of twitter for making it feedable to the model, whose size is then (14640, 1000). As the Y, means the label is in the text form so, I applied LabelEncoder so that I can make it in a single line. Like this -

[1 2 1 ... 1 0 1]

This is how I split my dataset -

X_train, X_test, Y_train, Y_test = train_test_split(bow, Y, test_size=0.3, stratify=Y, random_state=42)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)

out:(10248, 1000) (10248,)
(4392, 1000) (4392,)

stratify=y will make the imbalanced data into a proper weighted form. For the classifier part, I have used SVM -

svc = svm.SVC(kernel='linear', C=1, probability=True, class_weight='balanced').fit(X_train, Y_train) 
prediction = svc.predict_proba(X_test) 
prediction_int = prediction[:,1] >= 0.3 
prediction_int = prediction_int.astype(np.int) 
print(prediction_int)
print('Precision score: ', precision_score(Y_test, prediction_int, average=None))
print('Accuracy Score: ', accuracy_score(Y_test, prediction_int))

out:[0 0 0 ... 1 0 0]
Precision score:  [0.74185137 0.50075529 0.        ]
Accuracy Score:  0.6691712204007286
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/classification.py:1437: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)

@desertnaut helped me a lot to decide, what is the actual problem, lastly, I saw that the classifier is unable to predict the third class. You can see that I have printed out prediction_int and it is not showing any 2 index. Also, it is nowhere near actual labels. I am worried if there is any mistake, happened during classification. This classifier, I made for my binary classification, and I think I do not need to change it for multi-class classification. Can any of you help me to solve this?

Upvotes: 2

Views: 1035

Answers (1)

PV8
PV8

Reputation: 6260

the problem is that the predict_proba method you are using is for binary classification. In a multi classification it gives the probability for each class.

You cannot use this command:

prediction_int = prediction[:,1] >= 0.3 

For futher information you can look this similiar post: Multiclass Classification and probability prediction

Update

I just made it after changing all the prediction function to just this single line -

pred = svc.predict(X_test)  

As he told, previously I was using my binary classification prediction system. Now this predict can classify all the 3 labels. So, my precision and recall is working perfectly now.

Upvotes: 1

Related Questions