Reputation: 393
I am working on a multiclassification project and I noticed that no matter what classifier I run the precision and recall are the same within a model.
The classification problem has three distinct classes. The volume of the data is rather on the small side with 13k instances divided into test (0.8) and train (0.2).
Training data has a shape of (10608, 28) and labels are the shape of (10608, 3) (binarized label).
The classification is imbalanced:
I am comparing different classifiers, to later focus on the most promising ones. While calculating precision and recall for each model I noticed that they are always the same within a model.
Due to how precision and recall are calculated they can be the same when the number of false-negative predictions equals the number of false-positive predictions FP = FN
.
Examples:
sgd_clf = OneVsRestClassifier(SGDClassifier(random_state=42))
sgd_clf.fit(data_tr, labels_tr)
y_pred_sgd = cross_val_predict(sgd_clf, data_tr, labels_tr, cv=5)
cm_sgd = confusion_matrix(labels_tr.argmax(axis=1), y_pred_sgd.argmax(axis=1))
cm_sgd:
array([[1038, 19, 2084],
[ 204, 22, 249],
[ 931, 48, 6013]], dtype=int64)
precision_score(labels_tr.argmax(axis=1), y_pred_sgd.argmax(axis=1), average="micro")
0.666760935143288
recall_score(labels_tr.argmax(axis=1), y_pred_sgd.argmax(axis=1), average="micro")
0.666760935143288
FP=FN=3535
lr_clf = OneVsRestClassifier(LogisticRegression(random_state=42, max_iter=4000))
lr_clf.fit(data_tr, labels_tr)
y_pred_lr = cross_val_predict(lr_clf, data_tr, labels_tr, cv=5)
cm_lr = confusion_matrix(labels_tr.argmax(axis=1), y_pred_lr.argmax(axis=1))
cm_lr:
array([[ 982, 1, 2158],
[ 194, 7, 274],
[ 774, 9, 6209]], dtype=int64)
precision_score(labels_tr.argmax(axis=1), y_pred_lr.argmax(axis=1), average="micro")
0.6785444947209653
recall_score(labels_tr.argmax(axis=1), y_pred_lr.argmax(axis=1), average="micro")
0.6785444947209653
FP=FN=3410
rf_clf = OneVsRestClassifier(RandomForestClassifier(random_state=42))
rf_clf.fit(data_tr, labels_tr)
y_pred_forest = cross_val_predict(rf_clf, data_tr, labels_tr, cv=5)
cm_forest = confusion_matrix(labels_tr.argmax(axis=1), y_pred_forest.argmax(axis=1))
cm_forest:
array([[1576, 56, 1509],
[ 237, 45, 193],
[1282, 61, 5649]], dtype=int64)
precision_score(labels_tr.argmax(axis=1), y_pred_forest.argmax(axis=1), average="micro")
0.6853318250377074
recall_score(labels_tr.argmax(axis=1), y_pred_forest.argmax(axis=1), average="micro")
0.6853318250377074
FP=FN=3338
How likely is it that all the models have the same recall and precision within a model? Am I missing something?
Upvotes: 3
Views: 2632
Reputation: 5174
This is happening because you are calculating the micro
average of your scores. In the docs, it is described as:
Calculate metrics globally by counting the total true positives, false negatives and false positives.
Now here is the catch: in classification tasks where every test case is guaranteed to be assigned to exactly one class, computing a micro
average is equivalent to computing the accuracy score. This is why you get the same result for precision and recall in each model: you are basically computing the accuracy in all cases.
You can verify this by using accuracy_score
and comparing the results.
As a consequence, you should better evaluate the precision and recall of your models with either macro
or weighted
average instead.
Upvotes: 5