Mav
Mav

Reputation: 117

Is there a way to implement a 2x2 confusion matrix for multilabel classifier?

I'm interested in creating a 2x2 confusion matrix for a multilabel classification problem, where it only shows the total false/true positives/negatives.

I have a section of code that generates a full confusion matrix, but with 98 labels, it's nearly impossible to see anything. I don't really care too much about having a full matrix, so a 2x2 where it only shows the aforementioned four attributes would be ideal, I'm just not sure how to implement it.

Here's the code snippet, if it helps:

predictions_d7 = model_d7.predict(x_test_d7)

y_pred = np.argmax(predictions_d7, axis=1)
y_test = np.argmax(Y_test_d7, axis=1)

print(y_test)
print(y_pred)

cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=[label_list)
fig, ax = plt.subplots(figsize=(20,20))
disp.plot(ax=ax, values_format="d", cmap='gray')
disp.im_.colorbar.remove()
print( classification_report(y_test,y_pred))

Upvotes: 0

Views: 685

Answers (2)

user11989081
user11989081

Reputation: 8663

You could calculate a 2 x 2 confusion matrix as follows:

import numpy as np

def confusion_matrix(y_true, y_pred):

    tp = np.logical_and(y_pred == 1, y_true == 1).sum()
    tn = np.logical_and(y_pred == 0, y_true == 0).sum()
    fp = np.logical_and(y_pred == 1, y_true == 0).sum()
    fn = np.logical_and(y_pred == 0, y_true == 1).sum()

    return tp, tn, fp, fn

from sklearn.datasets import make_multilabel_classification
from sklearn.ensemble import RandomForestClassifier

X, y = make_multilabel_classification(random_state=42)

clf = RandomForestClassifier(max_depth=3, random_state=42)
clf.fit(X, y)

y_pred = clf.predict(X)

tp, tn, fp, fn = confusion_matrix(y, y_pred)
print(tp, tn, fp, fn)
# 114 314 7 65

Upvotes: 0

dvr
dvr

Reputation: 370

The reason you get a 2x2 matrix in the case you are hoping for, is that there are precisely two labels. You can think of this as labels 1 and 2, true or false, it doesn't matter.

However, try adding a 3rd label and think about how you might compute "true positive". Is it even possible?

No, it must be a 3x3 matrix, since there are 3 possibilities for every class and therefore 9 possibilities total - ex: it was class 1 and you predicted class 1. It was class 1 but you predicted class 2, and so on.

Perhaps you should use the nxn confusion matrix you receive, and then use some common metrics to assess your model (accuracy, precision, recall etc). You can still do it in n dimensions. See this stack exchange post for a description: https://stats.stackexchange.com/questions/91044/how-to-calculate-precision-and-recall-in-a-3-x-3-confusion-matrix

Upvotes: 2

Related Questions