Reputation:
I want to evaluate a clustering method with some synthetic data using several evaluation scores (NMI, ARI, F1) from sklearn. While NMI and ARI are working fine I do have problems concerning the F1 score where labels are switched e.g., true labels are [0, 0, 0, 1, 1, 1]
and the predicted labels are [1, 1, 1, 0, 0, 0]
. For clustering this is a perfect result as both clusters have been correctly identified, only the labels are switched: cluster 1
has the label 0
and vice versa. The F1 score does not seem to be able to handle this as my code produces an F1 score of 0.0
. I assume this happens because the labels do not have the same name/number, however I can not manually switch label names for each cluster as this is way to much work especially for huge datasets, so is there a more generic solution to this?
Example code:
from sklearn.metrics import f1_score
if __name__ == '__main__':
labels = [0, 0, 0, 1, 1, 1]
pred = [1, 1, 1, 0, 0, 0]
print(f1_score(labels, pred, average='micro')
Upvotes: 1
Views: 682
Reputation: 759
The F1 score is calculated as:
2*((precision*recall)/(precision+recall))
As I am sure you are aware precision is defined as:
TP/(TP+FP)
Recall is:
TP/(TP+FN)
So in the case above TP=0
, FP=3
, FN=3
Therefore precision and recall are both 0. Which in turn makes your F1 score calculation look like
2*((0*0)/(0+0))
In reality I believe that should error as you are dividing by zero but perhaps scikit learn handles that differently.
So in your case you will have to correctly label the prediction to match the ground truth if that is truly the case. The F1 score would never know that information. The issue may be how the labels are being applied to your clusters, or your test data and not in the f1 score.
Upvotes: 1