Reputation: 2743
So I have a binary classification model that get really good scores in the training validation and testing phases.
validation_generator.reset # reset the validation gen for testing
loss: 0.0725 - accuracy: 0.9750 - val_loss: 0.1703 - val_accuracy: 0.9328
scores = model.evaluate_generator(validation_generator, workers=1, use_multiprocessing=False, verbose=1)
print(scores)
[0.023366881534457207, 0.9353214502334595]
Ok, so that looks really good to me, correct? Now when I try the confusion metrics this all gets grouped over to one class which is totally wrong.
Confusion Matrix
[[1045 0]
[1537 0]]
Here is the CM code:
validation_generator.reset
Y_pred = model.predict_generator(validation_generator, validation_generator.samples // BATCH_SIZE+1)
y_pred = np.argmax(Y_pred, axis=1)
print(confusion_matrix(validation_generator.classes, y_pred))
target_names = ['male', 'female']
print(classification_report(validation_generator.classes, y_pred, target_names=target_names))
That should not be I dont think. It might be with the generators possibly but it looks correct to me.
BATCH_SIZE = 32
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
horizontal_flip=True,
validation_split=0.2) # set validation split
train_generator = train_datagen.flow_from_directory(
DATA_PATH,
target_size=(224, 224),
shuffle=True,
batch_size=BATCH_SIZE,
class_mode='binary',
subset='training') # set as training data
validation_generator = train_datagen.flow_from_directory(
DATA_PATH, # same directory as training data
target_size=(224, 224),
batch_size=BATCH_SIZE,
shuffle=False,
class_mode='binary',
subset='validation') # set as validation data
Should I set the validation batch size to 1?
Here is the model declaration if that helps.
history = model.fit_generator(
train_generator,
steps_per_epoch = train_generator.samples // BATCH_SIZE,
validation_data = validation_generator,
validation_steps = validation_generator.samples // BATCH_SIZE,
epochs = EPOCHS,
verbose=1,
callbacks=callbacks_list)
UPDATE AND FIX FOR THIS PROBLEM:
Add this to the code
y_pred[y_pred <= 0.5] = 0.
y_pred[y_pred > 0.5] = 1.
#Old code
#y_pred = np.argmax(Y_pred, axis=1) # This does not work for this
Upvotes: 1
Views: 92
Reputation: 542
As far as I understand you are doing a binary classification and I see in your code that you are using np.argmax(Y_pred, axis=1)
. I think argmax should be used with multiple class classification.
For solution, you should try something like y_pred = [y[0] >= 0.5 for y in y_pred]
Note that I'm not sure about this code works exactly or not but I'm sure that np.argmax()
need to replaced.
Upvotes: 1