Reputation: 377

Very good validation accuracy but bad predictions

I'm building a keras model to classify cats and dogs. I used transfer learning with bottleneck features and fine tuning with vgg model. Now I get very good validation accuracy like 97% but when I get to predict I get very bad results regarding the classification report and confusion matrix. What could be the problem?

Here is the code of fine tuning and the results I get

base_model = applications.VGG16(weights='imagenet', include_top=False, input_shape=(150,150,3))
print('Model loaded.')

# build a classifier model to put on top of the convolutional model
top_model = Sequential()
top_model.add(Flatten(input_shape=base_model.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(2, activation='sigmoid'))

# note that it is necessary to start with a fully-trained
# classifier, including the top classifier,
# in order to successfully do fine-tuning
top_model.load_weights(top_model_weights_path)

# add the model on top of the convolutional base
# model.add(top_model)
model = Model(inputs=base_model.input, outputs=top_model(base_model.output))

# set the first 25 layers (up to the last conv block)
# to non-trainable (weights will not be updated)
for layer in model.layers[:15]:
    layer.trainable = False

# compile the model with a SGD/momentum optimizer
# and a very slow learning rate.
model.compile(loss='binary_crossentropy',
              optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
              metrics=['accuracy'])

# prepare data augmentation configuration
train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical')

model.summary()

# fine-tune the model
model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples // batch_size,
    verbose=2)
scores=model.evaluate_generator(generator=validation_generator,
steps=nb_validation_samples // batch_size)
print("Accuracy = ", scores[1])

Y_pred = model.predict_generator(validation_generator, nb_validation_samples // batch_size)

y_pred = np.argmax(Y_pred, axis=1)

print('Confusion Matrix')

print(confusion_matrix(validation_generator.classes, y_pred))

print('Classification Report')

target_names = ['Cats', 'Dogs']

print(classification_report(validation_generator.classes, y_pred, target_names=target_names))
model.save("model_tuned.h5")

Accuracy = 0.974375

Confusion Matrix [[186 214] [199 201]]

Classification Report

          precision    recall  f1-score   support

    Cats       0.48      0.47      0.47       400
    Dogs       0.48      0.50      0.49       400

micro avg 0.48 0.48 0.48 800 macro avg 0.48 0.48 0.48 800 weighted avg 0.48 0.48 0.48 800

Upvotes: 5

Answers (5)

Hemant Yadav

Reputation: 83

I am doing skin cancer classification and the data are balanced. Now consider the confusion matrix below and its accuracy. This is still not matching and it can not be a case of data imbalance. Link

Dataset test_pred = model.predict(test_generator) output is accuracy 89% and

the confusion matrix is given by array([[267, 271], [233, 229]]) This is not anyway matching. [1]: https://kaggle.com/hasnainjaved/melanoma-skin-cancer-dataset-of-10000-images

Upvotes: 1

user20532064

Reputation: 1

Somehow, the predict_generator() of Keras' model does not work as expected. I would rather loop through all test images one-by-one and get the prediction for each image in each iteration. I am using Plaid-ML Keras as my backend and to get prediction I am using the following code.

import os
from PIL import Image
import keras
import numpy

print("Prediction result:")
dir = "/path/to/test/images"
files = os.listdir(dir)
correct = 0
total = 0
#dictionary to label all traffic signs class.
classes = {
    0:'This is Cat',
    1:'This is Dog',
}
for file_name in files:
    total += 1
    image = Image.open(dir + "/" + file_name).convert('RGB')
    image = image.resize((100,100))
    image = numpy.expand_dims(image, axis=0)
    image = numpy.array(image)
    image = image/255
    pred = model.predict_classes([image])[0]
    sign = classes[pred]
    if ("cat" in file_name) and ("cat" in sign):
        print(correct,". ", file_name, sign)
        correct+=1
    elif ("dog" in file_name) and ("dog" in sign):
        print(correct,". ", file_name, sign)
        correct+=1
print("accuracy: ", (correct/total))

Upvotes: 0

AureliuS

Reputation: 21

There are usually two reasons for this problem:

The most common one is when we implement (predict) the model with a different form of image (maybe forget to normalize or mix up height and width). This does not seem to be the case here.
The second one is when there are many more samples of one class over the others. Say there are 1000 samples A and 100 samples B. If the model only gesses A it will be correct 90% of the time. This is called a "local minimum" in mathematics, and even if the validation result yields 0.9 accuracy, the implementation will be horrible.

In short, are you dealing with imbalanced data? It is sometimes hard to avoid local minima in this case. Could this be the issue here?

Upvotes: 0

Igna

Reputation: 1127

I think the problem is that you should add shuffle = False in your validation generator

validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)

The problem is that the default behaviour is to shuffle the images so the label order of

validation_generator.classes

doesn't match the generator

Upvotes: 6

Dr. Snoopy

Reputation: 56407

There are two issues with your model. First you need to use softmax activation if you have more than one output neuron:

top_model.add(Dense(2, activation='softmax'))

And then you have to use categorical_crossentropy loss, binary crossentropy is only for when you have one output neuron with sigmoid activations.

model.compile(loss='categorical_crossentropy',
              optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
              metrics=['accuracy'])

Upvotes: 1

Very good validation accuracy but bad predictions

Answers (5)

Related Questions