Reputation: 85
I have got two DataFrames for which I would like to compute a confusion matrix.
The below is an example of how 'df_responses' is structured:
Question 1 | Red | Blue | Yellow | None of the Above |
Participant ID | | | | |
1 | 1 | 1 | 1 | 0 |
2 | 0 | 0 | 0 | 1 |
3 | 1 | 0 | 1 | 0 |
And, the below is an example of how 'df_actual' is structured:
Question 1 | Red | Blue | Yellow | None of the Above |
| | | | |
1 | 1 | 0 | 1 | 0 |
2 | 1 | 0 | 1 | 0 |
3 | 1 | 0 | 1 | 0 |
Ideally, I would also like to create a new DataFrame that contains the True Positive and False Negative score for each participant as follows:
Question 1 | True Positive | False Negative |
Participant ID | | |
1 | 2 | 0 |
2 | 0 | 2 |
3 | 2 | 0 |
I have tried (@John Mommers):
for x in range(len(df_responses)):
tn, fp, fn, tp = confusion_matrix(df_responses, df_actual).ravel()
print (f'Nr:{i} true neg:{tn} false pos:{fp} false neg:{fn} true pos:{tp}')
However, I get a
ValueError: multilabel-indicator is not supported.
Is there perhaps another way I can compute TP and FN?
Addition (data as text):
df_responses
{'Red': {1: 1, 2: 0, 3: 1},
'Blue': {1: 1, 2: 0, 3: 0},
'Yellow': {1: 1, 2: 0, 3: 1},
'None of the above': {1: 0, 2: 1, 3: 0}}
df_actual
{'Red': {1: 1, 2: 1, 3: 1},
'Blue': {1: 0, 2: 0, 3: 0},
'Yellow': {1: 1, 2: 1, 3: 1},
'None of the above': {1: 0, 2: 0, 3: 0}}
Upvotes: 1
Views: 1418
Reputation: 11161
You can create the df you want by e.g.:
df = pd.DataFrame()
df["tp"] = np.sum((df_actual == 1) & (df_responses == 1), axis=1)
df["fp"] = np.sum((df_actual == 0) & (df_responses == 1), axis=1)
Note this is not really a confusion matrix - in that case your rows are predicted and columns label values (or vice versa), with values as counts. This may not be well-defined for multi-value label/responses, that's why you're getting the error with sklearn.
Upvotes: 1
Reputation: 991
You can't use sklearn
function confusion_matrix
in this way because it only supports one-dimensional labels, and in your case you have four labels. That's why you're getting the error multilabel-indicator is not supported
.
So you have to pass each line of your data frame to this function.
for x in range(len(df_responses)):
y_responses = df_responses.iloc[x].to_numpy()
y_actual = df_actual.iloc[x].to_numpy()
tn, fp, fn, tp = confusion_matrix(y_responses, y_actual).ravel()
print (f'Nr:{i} true neg:{tn} false pos:{fp} false neg:{fn} true pos:{tp}')
Upvotes: 0