Keras Model gets high accuracy in training and validation then messes up on Confusion metric

Question

So I have a binary classification model that get really good scores in the training validation and testing phases.

validation_generator.reset # reset the validation gen for testing
loss: 0.0725 - accuracy: 0.9750 - val_loss: 0.1703 - val_accuracy: 0.9328
scores = model.evaluate_generator(validation_generator, workers=1, use_multiprocessing=False, verbose=1)
print(scores)
[0.023366881534457207, 0.9353214502334595]

Ok, so that looks really good to me, correct? Now when I try the confusion metrics this all gets grouped over to one class which is totally wrong.

Confusion Matrix
[[1045    0]
[1537    0]]

Here is the CM code:

validation_generator.reset
Y_pred = model.predict_generator(validation_generator, validation_generator.samples // BATCH_SIZE+1)
y_pred = np.argmax(Y_pred, axis=1)
print(confusion_matrix(validation_generator.classes, y_pred))
target_names = ['male', 'female']
print(classification_report(validation_generator.classes, y_pred, target_names=target_names))

That should not be I dont think. It might be with the generators possibly but it looks correct to me.

BATCH_SIZE = 32
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
                               horizontal_flip=True,                            
                               validation_split=0.2) # set validation split

train_generator = train_datagen.flow_from_directory(
    DATA_PATH,
    target_size=(224, 224),
    shuffle=True,
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='training') # set as training data

validation_generator = train_datagen.flow_from_directory(
    DATA_PATH, # same directory as training data
    target_size=(224, 224),
    batch_size=BATCH_SIZE,
    shuffle=False,
    class_mode='binary',
    subset='validation') # set as validation data

Should I set the validation batch size to 1?

Here is the model declaration if that helps.

history = model.fit_generator(
    train_generator,
    steps_per_epoch = train_generator.samples // BATCH_SIZE,
    validation_data = validation_generator, 
    validation_steps = validation_generator.samples // BATCH_SIZE,
    epochs = EPOCHS,
    verbose=1, 
    callbacks=callbacks_list)

UPDATE AND FIX FOR THIS PROBLEM:

Add this to the code

y_pred[y_pred <= 0.5] = 0.
y_pred[y_pred > 0.5] = 1.
#Old code
#y_pred = np.argmax(Y_pred, axis=1) # This does not work for this

Physicing · Accepted Answer

As far as I understand you are doing a binary classification and I see in your code that you are using np.argmax(Y_pred, axis=1). I think argmax should be used with multiple class classification.

For solution, you should try something like y_pred = [y[0] >= 0.5 for y in y_pred]

Note that I'm not sure about this code works exactly or not but I'm sure that np.argmax() need to replaced.

Keras Model gets high accuracy in training and validation then messes up on Confusion metric

Answers (1)

Related Questions