Reputation: 3
I want to evaluate a machine learning system by calculating the f1_score with Scikit-learn on my predictions. However, the results are not as expected. Calling the confusion_matrix shows
[[ 3 11]
[ 5 31]]
If I calculate the f1 score by hand as 2*(precision * recall) / (precision + recall), I get 2*(3/8 * 31/42)/(3/8 + 31/42) = 0.497. But calling f1_score(y_true, y_pred, average="binary") yields 0.7949. Does anybody have an explanation?
Even if I call f1_score with constant predictions of 1 when the true labels are mixed, I'm getting high scores instead of the 0 with a warning I'm expecting. I suspect the f1_score is not what I'm expecting for avg="binary", but I can't wrap my head around it.
My scikit-learn version is 0.21.3.
Thanks for your help.
Upvotes: 0
Views: 436
Reputation: 147
You have mistake calculating precision and recall values manually.
Precision = TruePositives / (TruePositives + FalsePositives)
Recall = TruePositives / (TruePositives + FalseNegatives)
Please revise your calculation! You will get 0.7949 value when you correct!
Upvotes: 1