Zhao
Zhao

Reputation: 2193

Computing macro f1 score using sklearn

I am using sklearn to compute macro f1 score and I doubt if there are any bugs in the code. Here is an example (label 0 is ignored):

from sklearn.metrics import f1_score, precision_recall_fscore_support

y_true = [1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4]
y_pred = [1, 1, 1, 0, 0, 2, 2, 3, 3, 3, 4, 3, 4, 3]



p_macro, r_macro, f_macro, support_macro \
    = precision_recall_fscore_support(y_true=y_true, y_pred=y_pred, labels=[1, 2, 3, 4], average='macro')

p_micro, r_micro, f_micro, support_micro\
    = precision_recall_fscore_support(y_true=y_true, y_pred=y_pred, labels=[1, 2, 3, 4], average='micro')

def f(p, r):
    return 2*p*r/(p+r)

my_f_macro = f(p_macro, r_macro)

my_f_micro = f(p_micro, r_micro)

print('my f macro {}'.format(my_f_macro))

print('my f micro {}'.format(my_f_micro))

print('macro: p {}, r {}, f1 {}'.format(p_macro, r_macro, f_macro))

print('micro: p {}, r {}, f1 {}'.format(p_micro, r_micro, f_micro))

The output:

my f macro 0.6361290322580646
my f micro 0.6153846153846153
macro: p 0.725, r 0.5666666666666667, f1 0.6041666666666666
micro: p 0.6666666666666666, r 0.5714285714285714, f1 0.6153846153846153

As you can see, sklearn gives 0.6041666666666666 for macro f1. However, it does not equal to 2*0.725*0.566666666/(0.725+0.566666666), where 0.725 and 0.566666666 are macro precision and macro recall computed by sklearn.

Upvotes: 0

Views: 7793

Answers (1)

Vivek Kumar
Vivek Kumar

Reputation: 36599

There's a difference in procedure to calculate 'macro' and 'micro' averages.

As given in the documentation of f_score:

'micro': Calculate metrics globally by counting the total true positives, false negatives and false positives.

'macro': Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

In macro, the recall, precision and f1 for all classes are computed individually and then their mean is returned. So you cannot expect to apply your formula def f(p, r) on them. Because they are not the same thing as you intended.

In micro, the f1 is calculated on the final precision and recall (combined global for all classes). So that is matching the score that you calculate in my_f_micro.

Hope it makes sense.

For more explanation, you can read the answer here:-

Upvotes: 3

Related Questions