beepboop
beepboop

Reputation: 41

Discrepancy in the results of model.evaluate and model.predict in Keras

I'm performing multi-class classification with three class labels in Keras. During training, both the training and validation losses were decreasing and accuracies were increasing. After training, I tested out the model on the training set as a sanity check and there seems to be a huge discrepancy between model.evaluate and model.predict. I did find some solutions that seemed to indicate this was an issue with BatchNorm and Dropout layers, but that shouldn't result in such a huge difference. The relevant code is as shown below.

model=Sequential()
model.add(Conv2D(32, (3, 3), padding="same",input_shape=input_shape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.25))
.
.
model.add(Dense(n_classes))
model.add(Activation("softmax"))

optimizer=Adam()
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['categorical_accuracy'])

datagen = ImageDataGenerator(horizontal_flip=True, fill_mode='nearest')

train_datagen = datagen.flow(X_train, y_train, batch_size=batch_size)
val_datagen = ImageDataGenerator().flow(X_val, y_val, batch_size=batch_size)

history=model.fit(train_datagen, steps_per_epoch=math.ceil(nb_train_samples/batch_size), verbose=2, epochs=50, validation_data=val_datagen, validation_steps=math.ceil(nb_validation_samples/batch_size), class_weight=d_class_weights)

print('model.evaluate accuracy: ', model.evaluate(X_train, y_train, batch_size=batch_size)[1])
test_pred = model.predict(ImageDataGenerator().flow(X_train, y=None, batch_size=batch_size), steps=math.ceil(nb_train_samples/batch_size))
test_result=np.array(test_pred)
test_result = np.zeros(test_result.shape)
test_result[np.arange(len(test_pred)), test_pred.argmax(1)] = 1
total=0
count=0
for i in range(test_result.shape[0]):
    total+=1
    count+=(test_result[i]==y_train[i]).all()
print('model.predict accuracy: ', count/total)

The output I get is as follows:-

66/66 [==============================] - 12s 177ms/step - loss: 0.0010 - categorical_accuracy: 1.0000
model.evaluate accuracy:  1.0
model.predict accuracy:  0.42138063279002874

I've been trying to solve this for a while now and have failed to find anything. I'm already using categorical_crossentropy, categorical_accuracy, and softmax activation in the last layer, so I have no idea what's wrong. Any help would be greatly appreciated!

Upvotes: 1

Views: 765

Answers (1)

beepboop
beepboop

Reputation: 41

I finally found the solution, turns out that I'm only passing X_train into the predict function, and the shuffle parameter is True by default, because of which the predictions didn't correspond to the ground truth. Setting shuffle=False solved the problem.

test_pred = model.predict(ImageDataGenerator().flow(X_train, y=None, batch_size=batch_size, shuffle=False), steps=math.ceil(nb_train_samples/batch_size))

Upvotes: 3

Related Questions