Reputation: 6826
I want to use sklearn.metrics.confusion_matrix(y_true, y_pred)
to create a confusion matrix for a keras model.
After training a model I can use predict_generator(generator)
to get predictions for a test dataset, which gives me y_pred
. How can I get the corresponding true labels, y_true
from a data generator?
Upvotes: 4
Views: 7188
Reputation: 6826
After creating a data generator, either your own or the built in ImageDataGenerator
, use your trained model to make predictions:
true_labels = data_generator.classes
predictions = model.predict_generator(data_generator)
sklearn's confusion matrix expects a 1-d array of labels, so you have to convert your predictions using np.argmax()
y_true = true_labels
y_pred = np.array([np.argmax(x) for x in predictions])
Then you can use those variables directly in the confusion_matrix
function
cm = sklearn.metrics.confusion_matrix(y_true, y_pred)
And you can plot it using the example plot_confusion_matrix()
function found here:
https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
Upvotes: 1
Reputation: 5822
generator.classes
will give you observed values in sparse format. You probably need it in dense (i.e., one-hot encoded format). You could get that with:
import pandas as pd
pd.get_dummies(pd.Series(generator.classes)).to_dense()
NOTE though: you must set the generator's shuffle
attribute to False
before generating the predictions and fetching the observed classes, otherwise your predictions and observations will not line up!
Upvotes: 4