student214
student214

Reputation: 85

How to compute a confusion matrix derived from multiple columns?

I have got two DataFrames for which I would like to compute a confusion matrix.

The below is an example of how 'df_responses' is structured:

Question 1      |  Red  |  Blue  | Yellow | None of the Above |   
Participant ID  |       |        |        |                   |
1               |   1   |    1   |    1   |       0           |
2               |   0   |    0   |    0   |       1           |
3               |   1   |    0   |    1   |       0           |

And, the below is an example of how 'df_actual' is structured:

Question 1      |  Red  |  Blue  | Yellow | None of the Above |   
                |       |        |        |                   |
1               |   1   |    0   |    1   |       0           |
2               |   1   |    0   |    1   |       0           |
3               |   1   |    0   |    1   |       0           |

Ideally, I would also like to create a new DataFrame that contains the True Positive and False Negative score for each participant as follows:

Question 1      | True Positive | False Negative | 
Participant ID  |               |                | 
1               |     2         |       0        |
2               |     0         |       2        |
3               |     2         |       0        | 

I have tried (@John Mommers):

for x in range(len(df_responses)):
    tn, fp, fn, tp = confusion_matrix(df_responses, df_actual).ravel()
    print (f'Nr:{i}  true neg:{tn}  false pos:{fp}   false neg:{fn}   true pos:{tp}')

However, I get a

ValueError: multilabel-indicator is not supported. 

Is there perhaps another way I can compute TP and FN?


Addition (data as text):

df_responses

{'Red': {1: 1, 2: 0, 3: 1},
'Blue': {1: 1, 2: 0, 3: 0},
'Yellow': {1: 1, 2: 0, 3: 1},
'None of the above': {1: 0, 2: 1, 3: 0}}

df_actual

{'Red': {1: 1, 2: 1, 3: 1},
'Blue': {1: 0, 2: 0, 3: 0},
'Yellow': {1: 1, 2: 1, 3: 1},
'None of the above': {1: 0, 2: 0, 3: 0}}
  

Upvotes: 1

Views: 1418

Answers (2)

anon01
anon01

Reputation: 11161

You can create the df you want by e.g.:

df = pd.DataFrame()   
df["tp"] = np.sum((df_actual == 1) & (df_responses == 1), axis=1)
df["fp"] = np.sum((df_actual == 0) & (df_responses == 1), axis=1)

Note this is not really a confusion matrix - in that case your rows are predicted and columns label values (or vice versa), with values as counts. This may not be well-defined for multi-value label/responses, that's why you're getting the error with sklearn.

Upvotes: 1

Henrique Andrade
Henrique Andrade

Reputation: 991

You can't use sklearn function confusion_matrix in this way because it only supports one-dimensional labels, and in your case you have four labels. That's why you're getting the error multilabel-indicator is not supported.

So you have to pass each line of your data frame to this function.

for x in range(len(df_responses)):
   y_responses = df_responses.iloc[x].to_numpy()
   y_actual = df_actual.iloc[x].to_numpy()
   tn, fp, fn, tp = confusion_matrix(y_responses, y_actual).ravel()
   print (f'Nr:{i} true neg:{tn} false pos:{fp} false neg:{fn} true pos:{tp}')

Upvotes: 0

Related Questions