Multi-class multi-label confusion matrix with Sklearn

I am working with a multi-class multi-label output from my classifier. The total number of classes is 14 and instances can have multiple classes associated. For example:

y_true = np.array([[0,0,1], [1,1,0],[0,1,0])
y_pred = np.array([[0,0,1], [1,0,1],[1,0,0])

The way I am making my confusion matrix right now:

matrix = confusion_matrix(y_true.argmax(axis=1), y_pred.argmax(axis=1))
print(matrix)

Which gives an output like:

[[ 79   0   0   0  66   0   0 151   1   8   0   0   0   0]
 [  4   0   0   0  11   0   0  27   0   0   0   0   0   0]
 [ 14   0   0   0  21   0   0  47   0   1   0   0   0   0]
 [  1   0   0   0   4   0   0  25   0   0   0   0   0   0]
 [ 18   0   0   0  50   0   0  63   0   3   0   0   0   0]
 [  4   0   0   0   3   0   0  19   0   0   0   0   0   0]
 [  2   0   0   0   3   0   0  11   0   2   0   0   0   0]
 [ 22   0   0   0  20   0   0 138   1   5   0   0   0   0]
 [ 12   0   0   0   9   0   0  38   0   1   0   0   0   0]
 [ 10   0   0   0   3   0   0  40   0   4   0   0   0   0]
 [  3   0   0   0   3   0   0  14   0   3   0   0   0   0]
 [  0   0   0   0   2   0   0   3   0   0   0   0   0   0]
 [  2   0   0   0  11   0   0  32   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   3   0   0   0   0   0   7]]

Now, I am not sure if the confusion matrix from sklearn is capable of handling multi-label multi-class data. Could someone help me with this?

Upvotes: 14

Answers (3)

mrh110

Reputation: 51

There is a method for creating a Multi-Label Confusion Matrix (MLCM) in the shape of a 2-dimensional (n+1 by n+1) matrix. To install "mlcm" and see one example on how to use it, go to: https://pypi.org/project/mlcm/ The MLCM method creates a confusion matrix that, similar to the multi-class (single-label) confusion matrix, shows the distribution of FNs from one class over other classes. The number of true labels for each instance of multi-label data varies from zero to n (i.e., the number of classes), and the number of predicted labels for each instance of multi-label data varies from zero to n. To overcome this issue (no-true-label and/or no-predicted-label), the mlcm method adds one row and one column to the confusion matrix, so it has n+1 rows and n+1 columns. The rows (and columns) 0 to n-1 correspond to classes 0 to n-1, respectively. The last row corresponds to the situation that the input instance has no true label. The last column corresponds to the situation that the classifier does not predict any class for the given data instance.
Please read the following paper for more information: M. Heydarian, T. Doyle, and R. Samavi, MLCM: Multi-Label Confusion Matrix, IEEE Access, Feb. 2022, DOI: 10.1109/ACCESS.2022.3151048

Upvotes: 5

ksopyla

Reputation: 513

Now you can use (version 0.21) sklearn.metrics.multilabel_confusion_matrix

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.multilabel_confusion_matrix.html

We try to predict two labels for each example

import sklearn.metrics as skm
y_true = np.array([
                [0,0], [0,1], [1,1], [0,1], [0,1], [1,1]
              ])
 y_pred = np.array([
                [1,1], [0,1], [0,1], [1,0], [0,1], [1,1] 
              ])

 cm = skm.multilabel_confusion_matrix(y_true, y_pred)
 print(cm)
 print( skm.classification_report(y_true,y_pred))

Confusion matrix for labels:

[[[2 2]
  [1 1]]

 [[0 1]
  [1 4]]]

Classification report:

              precision    recall  f1-score   support

         0       0.33      0.50      0.40         2
         1       0.80      0.80      0.80         5

micro avg        0.62      0.71      0.67         7
macro avg        0.57      0.65      0.60         7
weighted avg     0.67      0.71      0.69         7
samples avg      0.67      0.58      0.61         7

Upvotes: 28

Karl

Reputation: 5822

What you need to do is to generate multiple binary confusion matrices (since essentially what you have are multiple binary labels)

Something along the lines of:

import numpy as np
from sklearn.metrics import confusion_matrix

y_true = np.array([[0,0,1], [1,1,0],[0,1,0]])
y_pred = np.array([[0,0,1], [1,0,1],[1,0,0]])

labels = ["A", "B", "C"]

conf_mat_dict={}

for label_col in range(len(labels)):
    y_true_label = y_true[:, label_col]
    y_pred_label = y_pred[:, label_col]
    conf_mat_dict[labels[label_col]] = confusion_matrix(y_pred=y_pred_label, y_true=y_true_label)


for label, matrix in conf_mat_dict.items():
    print("Confusion matrix for label {}:".format(label))
    print(matrix)

Upvotes: 25

Multi-class multi-label confusion matrix with Sklearn

Answers (3)

Related Questions