Reputation: 557

F1-score per class for multi-class classification

I'm working on a multiclass classification problem using python and scikit-learn. Currently, I'm using the classification_report function to evaluate the performance of my classifier, obtaining reports like the following:

>>> print(classification_report(y_true, y_pred, target_names=target_names))
             precision    recall  f1-score   support

    class 0       0.50      1.00      0.67         1
    class 1       0.00      0.00      0.00         1
    class 2       1.00      0.67      0.80         3

avg / total       0.70      0.60      0.61         5

To do further analysis, I'm interesting in obtaining the per-class f1 score of each of the classes available. Maybe something like this:

>>> print(calculate_f1_score(y_true, y_pred, target_class='class 0'))
0.67

Is there something like that available on scikit-learn?

Upvotes: 19

Answers (5)

Alex Shroyer

Reputation: 3829

Separate F1 scores for each class, from a confusion matrix

NumPy operations on a confusion matrix are not terribly complex, so if you don't want or need to include the scikit-learn dependency, you can achieve all these results with only NumPy.

import numpy as np
C = make_confusion_matrix(data, labels, num_classes) # exercise for the reader

The typical explanation for F1 first classifies each item in the confusion matrix as one of true positive, false positive, or false negative:

TP = np.diag(C)      # true positive
FP = C.sum(1) - TP   # false positive
precision = TP / (TP + FP)

FN = C.sum(0) - TP   # false negative
recall = TP / (TP + FN)

Recognizing that precision = TP / (TP - TP + C.sum(1)) we can simplify to precision = TP / C.sum(1). We can simplify the recall calculation similarly, resulting in this F1 calculation for each class:

TP = np.diag(C)              # true positives
precision = TP/C.sum(1)
recall = TP/C.sum(0)
F1c = (2*precision*recall) / (precision+recall) # per-class F1 score

micro-averaged F1 score

For some use cases individual class F1 scores are all you need, but we can also compute a micro-averaged F1 score to summarize the quality across all classes with a single number.

Micro-averaged F-measure gives equal weight to each document and is therefore considered as an average over all the document/category pairs. It tends to be dominated by the classifier’s performance on common categories. link

precision_micro = TP.sum() / C.sum(1).sum()
recall_micro = TP.sum() / C.sum(0).sum()
micro_F1 = (2*precision_micro*recall_micro) / (precision_micro+recall_micro)

But if we note that C.sum(1).sum() == C.sum(0).sum() == C.sum() this simplifies to

# pm = TP.sum() / C.sum()  # micro precision
# rm = TP.sum() / C.sum()  # micro recall
m = TP.sum() / C.sum()   # just use one variable since pm == rm
micro_F1 = (2*m*m) / (m+m)
#        = ((2*m)*m) / (2*m)  # factor (2*m) from numerator and denominator
#        = m

macro-averaged F1 score

Similar to the micro-averaged F1 score, we can also compute a macro-averaged F1 score, which gives a different perspective about the overall performance of the model.

Macro-averaged F-measure gives equal weight to each category, regardless of its frequency. It is influenced more by the classifier’s performance on rare categories. link

This is the mean of the per-class F1 scores defined as F1c above:

F1c.mean()

What about zeros in the denominator?

The sklearn.metrics.f1_score function has an option called zero_division so you can choose a replacement value in case the denominator contains zeros. We can replicate this by adding np.nan_to_num to the division operations:

nan_fill_value = 0
precision = np.nan_to_num(TP/C.sum(1), nan_fill_value)
# same for each denominator that may contain zeros

Upvotes: 0

Daniel Wyatt

Reputation: 1151

I would use the f1_score along with the labels argument

from sklearn.metrics import f1_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
labels = [0, 1, 2]

f1_scores = f1_score(y_true, y_pred, average=None, labels=labels)
f1_scores_with_labels = {label:score for label,score in zip(labels, f1_scores)}

Outputs:

{0: 0.8, 1: 0.0, 2: 0.0}

Upvotes: 4

Zahid Equbal Akhtar

Reputation: 11

You just need to use pos_label as parameter and assign the class value which you want to print.

f1_score(ytest, ypred_prob, pos_label=0)# default is pos_label=1

Upvotes: 1

Vivek Subramanian

Reputation: 1234

If you only have the confusion matrix C, with rows corresponding to predictions and columns corresponding to truth, you can compute F1 score using the following function:

def f1(C):
    num_classes = np.shape(C)[0]
    f1_score = np.zeros(shape=(num_classes,), dtype='float32')
    weights = np.sum(C, axis=0)/np.sum(C)

    for j in range(num_classes):
        tp = np.sum(C[j, j])
        fp = np.sum(C[j, np.concatenate((np.arange(0, j), np.arange(j+1, num_classes)))])
        fn = np.sum(C[np.concatenate((np.arange(0, j), np.arange(j+1, num_classes))), j])
#         tn = np.sum(C[np.concatenate((np.arange(0, j), np.arange(j+1, num_classes))), np.concatenate((np.arange(0, j), np.arange(j+1, num_classes)))])

        precision = tp/(tp+fp) if (tp+fp) > 0 else 0
        recall = tp/(tp+fn) if (tp+fn) > 0 else 0
        f1_score[j] = 2*precision*recall/(precision + recall)*weights[j] if (precision + recall) > 0 else 0

    f1_score = np.sum(f1_score)
    return f1_score

Upvotes: 0

piman314

Reputation: 5355

Taken from the f1_score docs.

from sklearn.metrics import f1_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]

f1_score(y_true, y_pred, average=None)

Ouputs:

array([ 0.8,  0. ,  0. ])

Which is the scores for each class.

Upvotes: 29