Reputation: 11
I'm using scikit-learn (version 0.22.1) for a machine learning application.
I'm using a Random Forest algorithm and I have some problems in evaluating the performance of the algorithm using precision and recall. I have the labels of my test set (Y_test) and the labels predicted using the Random Forest algorithm (Y_pred). Both data contains two labels (1 and 0)
In detail, I have this matrix:
print(confusion_matrix(y_true=Y_test, y_pred=Y_pred, labels=[1,0]))
[[78 20]
[36 41]]
Consequently:
True Positive (tp) = 78
False Negative (fn) = 36
False Positive (fp) = 20
So:
PRECISION = tp/(tp+fn) = 78/(78+36) = 0.7959183673469388
RECALL = = tp/(tp+fp) = 78/(78+20) 0.6842105263157895
However, using this code:
precision = precision_score(Y_test, Y_pred, pos_label=1)
recall = recall_score(y_true=Y_test, y_pred=Y_pred, pos_label=1)
print("precision: ",precision)
print("recall: ",recall)
I get the following output:
recall: 0.7959183673469388
precision: 0.6842105263157895
It seems that the values are swapped when they are computed using the standard sklearn functions. Did I do something wrong? Please, can you give me some advice?
Thanks,
Daniele
Upvotes: 1
Views: 806
Reputation: 411
You are currently calculating those values wrong. The correct calculations are;
Precision Calculation:
precision = tp/(tp+fp)
Recall Calculation:
recall = tp/(tp+fn)
Reference: https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall
Upvotes: 1