ihadanny
ihadanny

Reputation: 4483

scikit-learn metrics on a subset of classes

We're using scikit-learn==0.15.2 and training LinearSVC on 9 classes and a special 'others' class. The 'others' class contain anything in our dataset which does not fit into the 9 important classes we are trying to classify.

We would like to get average micro/macro precision/recall/f1 metrics on only the 9 classes, without the 'others' class, in order to get a performance estimation for our classifier.

We've failed to find any support for that in the built-in scikit metrics functions. And even the classification_report function has an issue when trying to restrict the labels to only the 9 (https://github.com/scikit-learn/scikit-learn/issues/3123).

Is this lack of support indicating that our fundamental approach isn't correct? Should we include the 'others' when we measure performance?

EDIT: Note that our consumer uses our predictions only when we predict one of the 9 classes. If we predict 'others' our output is thrown away and another model is used.

Upvotes: 2

Views: 1476

Answers (2)

klubow
klubow

Reputation: 431

Why not use confusion matrix http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html ?

Based on that matrix you can create your own metrics

Upvotes: 3

lejlot
lejlot

Reputation: 66805

In short yes, you should include each class. Why would you ignore the (probably the biggest) class? Even if it is just noise it is fundamental to classifiers performance to actually be able to distinguish the noise from important classes. There might be situations where you are not interested in "others" class (in cases when False Positives are irrelevant) but these situations are quite rare and so are not directly implemented in scikit-learn's metrics module.

Upvotes: 3

Related Questions