F1 scores depend on which class is given the positive label?

Question

Does f1 score really depend on which class is given the positive label?

When I use scikit learn's f1 metric, it seems to:

>>> from sklearn import metrics as m
>>> m.f1_score([0,0,0,1,1,1],[0,0,0,1,1,0])
0.8
>>> m.f1_score([1,1,1,0,0,0],[1,1,1,0,0,1])
0.8571428571428571

The only difference between the first and second case is that 0 and 1 have been swapped. But I get a different answer.

This seems really bad. It means that if I'm reporting the f1 score for a cat/dog classifier, the value depends on whether cats or dogs get the positive label.

Is this really true, or did I mess something up?

Bob · Accepted Answer

For multiclass classification you should use a cross-entropy measure. Cross-Entropy is invariant to relabeling. By relabeling you are only reordering the terms in a summation.

If you want to use f1 score, you have to use F score, be aware that it will be invariant to label swapping if, and only if, the number of true positives equals the number of true negatives.

In your example I see 3 true negatives, 2 true positives. If I remove one true negative, we have the same F1 score after swapping labels.

m.f1_score([1,1,0,0,1],[1,1,0,0,0]) # 0.8
m.f1_score([0,0,1,1,0],[0,0,1,1,1]) # 0.8

Mathematically

Let's start with one formula from [Wikipedia F-score page], in order to skip some steps.

Were tp is for true positive rate, fn is false negative rate. I will use a ' to denote the measures for swapped labels.

By swapping labels we have tn'=tp, fn'=fp, fp'=fn, tp'=tn.

If you want F1'=F1. We have tp/(tp+(fn+fp)/2)=tp'/(tp'+(fn'+fp')/2)=tn/(tn+(fn+fp)/2). That is satisfied if, and only if, tp=tn.

F1 scores depend on which class is given the positive label?

Answers (1)

Mathematically

Related Questions