vman
vman

Reputation: 173

Keras model evaluate() vs. predict_classes() gives different accuracy results

I have been using TF2.0 recently. I have trained a simple CNN model (with Keras Sequential API) for binary classification of images. I have used tf.data.Dataset for loading the images from disk. Actually the model got pretty good accuracy, with train binary_accuracy: 0.9831 and validation binary_accuracy: 0.9494.

Tried evaluating the model using model.evaluate(). It gave binary accuracy of 0.9460. But when I tried to calculate binary accuracy manually using predict_classes(), I get around 0.384. I dont know what was the issue. Please help me out.

I have added my code used for compiling and training the model. Also the code for evaluating my model.

train_data = tf.data.Dataset.from_tensor_slices((tf.constant(train_x),tf.constant(train_y)))
val_data = tf.data.Dataset.from_tensor_slices((tf.constant(val_x),tf.constant(val_y)))

train_data = train_data.map(preproc).shuffle(buffer_size=100).batch(BATCH_SIZE)
val_data = val_data.map(preproc).shuffle(buffer_size=100).batch(BATCH_SIZE)

model.compile(optimizer=Adam(learning_rate=0.0001),
              loss='binary_crossentropy',
              metrics=[tf.keras.metrics.BinaryAccuracy()])

checkpointer = ModelCheckpoint(filepath='weights.hdf5', verbose=1, save_best_only=True)

time1 = time.time()
history = model.fit(train_data.repeat(),
                    epochs=EPOCHS,
                    steps_per_epoch=STEPS_PER_EPOCH,
                    validation_data=val_data.repeat(),
                    validation_steps=VAL_STEPS,
                    callbacks=[checkpointer])

29/29 [==============================] - 116s 4s/step - loss: 0.0634 - binary_accuracy: 0.9826 - val_loss: 0.1559 - val_binary_accuracy: 0.9494

Now testing with unseen data

test_data = tf.data.Dataset.from_tensor_slices((tf.constant(unseen_faces),tf.constant(unseen_labels)))
test_data = test_data.map(preproc).batch(BATCH_SIZE)

model.evaluate(test_data)

9/9 [==============================] - 19s 2s/step - loss: 0.1689 - binary_accuracy: 0.9460

The same model, when I tried to calculate accuracy using model.predict_classes with same dataset, the prediction results are far from the evaluation report. The binary accuracy comes around 38%.

Edit 1: Pre-processing function I used while training

def preproc(file_path,label):
    img = tf.io.read_file(file_path)
    img = tf.image.decode_jpeg(img)
    img = (tf.cast(img, tf.float32)/127.5) - 1
    return tf.image.resize(img,(IMAGE_HEIGHT,IMAGE_WIDTH)),label

Manual prediction code

from sklearn.metrics import classification_report

#Testing preprocessing function
def preproc_test(file_path):
    img = tf.io.read_file(file_path)
    img = tf.image.decode_jpeg(img)
    img = (tf.cast(img, tf.float32)/127.5) - 1
    return tf.image.resize(img,(IMAGE_HEIGHT,IMAGE_WIDTH))

unseen_faces = []
unseen_labels = []
for im_path in glob.glob('dataset/data/*'):
    unseen_faces.append(im_path)
    if 'real' in i:
        unseen_labels.append(0)
    else:
        unseen_labels.append(1)

unseen_faces = list(map(preproc_test,unseen_faces))
unseen_faces = tf.stack(unseen_faces)

predicted_labels = model.predict_classes(unseen_faces)

print(classification_report(unseen_labels,predicted_labels,[0,1]))

              precision    recall  f1-score   support

           0       0.54      0.41      0.47        34
           1       0.41      0.54      0.47        26

    accuracy                           0.47        60
   macro avg       0.48      0.48      0.47        60
weighted avg       0.48      0.47      0.47        60


Upvotes: 2

Views: 8381

Answers (2)

Qin Heyang
Qin Heyang

Reputation: 1674

In my case it is because the shape of my ground truth and predicted results are different. I was loading data by (x_train, y_train), (x_test, y_test) = cifar10.load_data(), where the y_train is a 2d ndarray of shape (50000,1) yet the prediction from model.predict_classes is of shape (50000,). If I directly compare them by np.mean(pred==y_train) I would have a result of 0.1 which is not correct. Instead np.mean(pred==np.squeeze(y_train)) gives the correct result.

Upvotes: 1

Rishab P
Rishab P

Reputation: 1633

Your model is doing good both during training and testing. Evaluation accuracy comes on the basis of prediction so maybe you are making some logical mistake while using model.predict_classes(). Please check if you are using the trained model weights and not any randomly initialized model while evaluating it.

evaluate: The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. model.evaluate() is for evaluating your trained model. Its output is accuracy or loss, not prediction to your input data.

predict: Generates output predictions for the input samples. model.predict() actually predicts, and its output is target value, predicted from your input data.

P.S.: For binary classification problem accuracy <=50% is worse than a random guess.

Upvotes: 1

Related Questions