Reputation: 377
I'm building a keras model to classify cats and dogs. I used transfer learning with bottleneck features and fine tuning with vgg model. Now I get very good validation accuracy like 97% but when I get to predict I get very bad results regarding the classification report and confusion matrix. What could be the problem?
Here is the code of fine tuning and the results I get
base_model = applications.VGG16(weights='imagenet', include_top=False, input_shape=(150,150,3))
print('Model loaded.')
# build a classifier model to put on top of the convolutional model
top_model = Sequential()
top_model.add(Dense(256, activation='relu'))
top_model.add(Dense(2, activation='sigmoid'))
# note that it is necessary to start with a fully-trained
# classifier, including the top classifier,
# in order to successfully do fine-tuning
# add the model on top of the convolutional base
# model.add(top_model)
model = Model(inputs=base_model.input, outputs=top_model(base_model.output))
# set the first 25 layers (up to the last conv block)
# to non-trainable (weights will not be updated)
for layer in model.layers[:15]:
layer.trainable = False
# compile the model with a SGD/momentum optimizer
# and a very slow learning rate.
optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
# prepare data augmentation configuration
train_datagen = ImageDataGenerator(
rescale=1. / 255,
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
target_size=(img_height, img_width),
validation_generator = test_datagen.flow_from_directory(
target_size=(img_height, img_width),
# fine-tune the model
steps_per_epoch=nb_train_samples // batch_size,
validation_steps=nb_validation_samples // batch_size,
steps=nb_validation_samples // batch_size)
print("Accuracy = ", scores[1])
Y_pred = model.predict_generator(validation_generator, nb_validation_samples // batch_size)
y_pred = np.argmax(Y_pred, axis=1)
print('Confusion Matrix')
print(confusion_matrix(validation_generator.classes, y_pred))
print('Classification Report')
target_names = ['Cats', 'Dogs']
print(classification_report(validation_generator.classes, y_pred, target_names=target_names))"model_tuned.h5")
Accuracy = 0.974375
Confusion Matrix [[186 214] [199 201]]
Classification Report
precision recall f1-score support
Cats 0.48 0.47 0.47 400
Dogs 0.48 0.50 0.49 400
micro avg 0.48 0.48 0.48 800 macro avg 0.48 0.48 0.48 800 weighted avg 0.48 0.48 0.48 800
Upvotes: 5
Views: 6027
Reputation: 83
I am doing skin cancer classification and the data are balanced. Now consider the confusion matrix below and its accuracy. This is still not matching and it can not be a case of data imbalance. Link
test_pred = model.predict(test_generator)
output is accuracy 89%
the confusion matrix is given by
array([[267, 271], [233, 229]])
This is not anyway matching.
Upvotes: 1
Reputation: 1
Somehow, the predict_generator() of Keras' model does not work as expected. I would rather loop through all test images one-by-one and get the prediction for each image in each iteration. I am using Plaid-ML Keras as my backend and to get prediction I am using the following code.
import os
from PIL import Image
import keras
import numpy
print("Prediction result:")
dir = "/path/to/test/images"
files = os.listdir(dir)
correct = 0
total = 0
#dictionary to label all traffic signs class.
classes = {
0:'This is Cat',
1:'This is Dog',
for file_name in files:
total += 1
image = + "/" + file_name).convert('RGB')
image = image.resize((100,100))
image = numpy.expand_dims(image, axis=0)
image = numpy.array(image)
image = image/255
pred = model.predict_classes([image])[0]
sign = classes[pred]
if ("cat" in file_name) and ("cat" in sign):
print(correct,". ", file_name, sign)
elif ("dog" in file_name) and ("dog" in sign):
print(correct,". ", file_name, sign)
print("accuracy: ", (correct/total))
Upvotes: 0
Reputation: 21
There are usually two reasons for this problem:
The most common one is when we implement (predict) the model with a different form of image (maybe forget to normalize or mix up height and width). This does not seem to be the case here.
The second one is when there are many more samples of one class over the others. Say there are 1000 samples A and 100 samples B. If the model only gesses A it will be correct 90% of the time. This is called a "local minimum" in mathematics, and even if the validation result yields 0.9 accuracy, the implementation will be horrible.
In short, are you dealing with imbalanced data? It is sometimes hard to avoid local minima in this case. Could this be the issue here?
Upvotes: 0
Reputation: 1127
I think the problem is that you should add shuffle = False in your validation generator
validation_generator = test_datagen.flow_from_directory(
target_size=(img_height, img_width),
The problem is that the default behaviour is to shuffle the images so the label order of
doesn't match the generator
Upvotes: 6
Reputation: 56357
There are two issues with your model. First you need to use softmax activation if you have more than one output neuron:
top_model.add(Dense(2, activation='softmax'))
And then you have to use categorical_crossentropy
loss, binary crossentropy is only for when you have one output neuron with sigmoid activations.
optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
Upvotes: 1