How to get the precision score of every class in a Multi class Classification Problem?

Question

I am making Sentiment Analysis Classification and I am doing it with Scikit-learn. This has 3 labels, positive, neutral and negative. The Shape of my training data is (14640, 15), where

negative    9178
neutral     3099
positive    2363

I have pre-processed the data and applied the bag-of-words word vectorization technique to the text of twitter as there many other attributes too, whose size is then (14640, 1000). As the Y, means the label is in the text form so, I applied LabelEncoder to it. This is how I split my dataset -

X_train, X_test, Y_train, Y_test = train_test_split(bow, Y, test_size=0.3, random_state=42)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)

out: (10248, 1000) (10248,)
     (4392, 1000) (4392,)

And this is my classifier

svc = svm.SVC(kernel='linear', C=1, probability=True).fit(X_train, Y_train) 
prediction = svc.predict_proba(X_test) 
prediction_int = prediction[:,1] >= 0.3 
prediction_int = prediction_int.astype(np.int) 
print('Precision score: ', precision_score(Y_test, prediction_int, average=None))
print('Accuracy Score: ', accuracy_score(Y_test, prediction_int))

out:Precision score:  [0.73980398 0.48169243 0.        ]
Accuracy Score:  0.6675774134790529
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/classification.py:1437: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)

Now I am not sure why the third one, in precision score is blank? I have applied average=None, because to make a separate precision score for every class. Also, I am not sure about the prediction, if it is right or not, because I wrote it for binary classification? Can you please help me to debug it to make it better. Thanks in advance.

desertnaut · Accepted Answer

As the warning explains:

UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples.

it seems that one of your 3 classes is missing from your predictions prediction_int (i.e. you never predict it); you can easily check if this is the case with

set(Y_test) - set(prediction_int)

which should be the empty set {} if this is not the case.

If this is indeed the case, and the above operation gives {1} or {2}, the most probable reason is that your dataset is imbalanced (you have much more negative samples), and you do not ask for a stratified split; modify your train_test_split to

X_train, X_test, Y_train, Y_test = train_test_split(bow, Y, test_size=0.3, stratify=Y, random_state=42)

and try again.

UPDATE (after comments):

As it turns out, you have a class imbalance problem (and not a coding issue) which prevents your classifier from successfully predicting your 3rd class (positive). Class imbalance is a huge sub-topic in itself, and there are several remedies proposed. Although going into more detail is arguably beyond the scope of a single SO thread, the first thing you should try (on top of the suggestions above) is to use the class_weight='balanced' argument in the definition of your classifier, i.e.:

svc = svm.SVC(kernel='linear', C=1, probability=True, class_weight='balanced').fit(X_train, Y_train)

For more options, have a look at the dedicated imbalanced-learn Python library (part of the scikit-learn-contrib projects).

How to get the precision score of every class in a Multi class Classification Problem?

Answers (1)

Related Questions