Reputation: 2193
I am using sklearn
to compute macro f1
score and I doubt if there are any bugs in the code. Here is an example (label 0
is ignored):
from sklearn.metrics import f1_score, precision_recall_fscore_support
y_true = [1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4]
y_pred = [1, 1, 1, 0, 0, 2, 2, 3, 3, 3, 4, 3, 4, 3]
p_macro, r_macro, f_macro, support_macro \
= precision_recall_fscore_support(y_true=y_true, y_pred=y_pred, labels=[1, 2, 3, 4], average='macro')
p_micro, r_micro, f_micro, support_micro\
= precision_recall_fscore_support(y_true=y_true, y_pred=y_pred, labels=[1, 2, 3, 4], average='micro')
def f(p, r):
return 2*p*r/(p+r)
my_f_macro = f(p_macro, r_macro)
my_f_micro = f(p_micro, r_micro)
print('my f macro {}'.format(my_f_macro))
print('my f micro {}'.format(my_f_micro))
print('macro: p {}, r {}, f1 {}'.format(p_macro, r_macro, f_macro))
print('micro: p {}, r {}, f1 {}'.format(p_micro, r_micro, f_micro))
The output:
my f macro 0.6361290322580646
my f micro 0.6153846153846153
macro: p 0.725, r 0.5666666666666667, f1 0.6041666666666666
micro: p 0.6666666666666666, r 0.5714285714285714, f1 0.6153846153846153
As you can see, sklearn
gives 0.6041666666666666
for macro f1
. However, it does not equal to 2*0.725*0.566666666/(0.725+0.566666666)
, where 0.725
and 0.566666666
are macro precision
and macro recall
computed by sklearn
.
Upvotes: 0
Views: 7793
Reputation: 36599
There's a difference in procedure to calculate 'macro' and 'micro' averages.
As given in the documentation of f_score:
'micro': Calculate metrics globally by counting the total true positives, false negatives and false positives.
'macro': Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
In macro, the recall, precision and f1 for all classes are computed individually and then their mean is returned. So you cannot expect to apply your formula def f(p, r)
on them. Because they are not the same thing as you intended.
In micro, the f1 is calculated on the final precision and recall (combined global for all classes). So that is matching the score that you calculate in my_f_micro
.
Hope it makes sense.
For more explanation, you can read the answer here:-
Upvotes: 3