jxn
jxn

Reputation: 8025

multi label classification confusion matrix have wrong number of labels

i am feeding in y_test and y_pred to a confusion matrix. My data is for multi label classification so the row values are one hot encodings.

my data has 30 labels but after feeding into the confusion matrix, the output only has 11 rows and cols which is confusing me. I thought i should have a 30X30.

Their formats are numpy arrays. (y_test and y_pred are dataframes of which i convert to numpy arrays using dataframe.values)

y_test.shape

(8680, 30)

y_test

array([[1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

y_pred.shape

(8680, 30)

y_pred

array([[1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

I transform them to confusion matrix usable format:

y_test2 = y_test.argmax(axis=1)
y_pred2 = y_pred.argmax(axis=1)
conf_mat = confusion_matrix(y_test2, y_pred2)

here is what my confusion matrix look like:

conf_mat.shape

(11, 11)

conf_mat

array([[4246,   77,   13,   72,   81,    4,    6,    3,    0,    0,    4],
       [ 106, 2010,   20,   23,   21,    0,    5,    2,    0,    0,    0],
       [ 143,   41,   95,   32,   10,    3,   14,    1,    1,    1,    2],
       [ 101,    1,    0,  351,   36,    0,    0,    0,    0,    0,    0],
       [ 346,   23,    7,   10,  746,    5,    6,    4,    3,    3,    2],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0]])

Why does my confusion matrix only have 11 X 11 shape? shouldn't it be 30X30?

Upvotes: 0

Views: 891

Answers (2)

Alex
Alex

Reputation: 19104

All this means is that some labels are unused.

y_test.any(axis=0)
y_pred.any(axis=0)

Should show that only 11 of the columns have any 1s in them.

Here's what it would look like if that was not the case:

from sklearn.metrics import confusion_matrix

y_test = np.zeros((8680, 30))
y_pred = np.zeros((8680, 30))

y_test[np.arange(8680), np.random.randint(0, 30, 8680)] = 1
y_pred[np.arange(8680), np.random.randint(0, 30, 8680)] = 1

y_test2 = y_test.argmax(axis=1)
y_pred2 = y_pred.argmax(axis=1)

confusion_matrix(y_test2, y_pred2).shape  # (30, 30)

Upvotes: 0

BENY
BENY

Reputation: 323226

I think you are not quit clear the definition of confusion_matrix

y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

Which in data frame is

pd.DataFrame(confusion_matrix(y_true, y_pred),columns=[0,1,2],index=[0,1,2])
Out[245]: 
   0  1  2
0  2  0  0
1  0  0  1
2  1  0  2

The column and index are the category of input.

You have (11,11), which means you only have 11 categories in your data

Upvotes: 1

Related Questions