Reputation: 2960
Accuracy reported by model.evaluate()
is very different from accuracy calculated from Sklearn or TF confusion matrix.
from sklearn.metrics import confusion_matrix
...
training_data, validation_data, testing_data = load_img_datasets()
# These ^ are tensorflow.python.data.ops.dataset_ops.BatchDataset
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = create_model(INPUT_SHAPE, NUM_CATEGORIES)
optimizer = tf.keras.optimizers.Adam()
metrics = ['accuracy']
model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=metrics)
history = model.fit(training_data, epochs=epochs,
validation_data=validation_data)
testing_data.shuffle(len(testing_data), reshuffle_each_iteration=False)
# I think this ^ is preventing additional shuffles on access
loss, accuracy = model.evaluate(testing_data)
print(f"Accuracy: {(accuracy * 100):.2f}%")
# Prints
# Accuracy: 78.7%
y_hat = model.predict(testing_data)
y_test = np.concatenate([y for x, y in testing_data], axis=0)
c_matrix = confusion_matrix(np.argmax(y_test, axis=-1),
np.argmax(y_hat, axis=-1))
print(c_matrix)
# Prints result that does not agree:
# Confusion matrix:
#[[ 72 111 54 15 69]
# [ 82 100 44 16 78]
# [ 64 114 52 21 69]
# [ 71 106 54 21 68]
# [ 79 101 51 25 64]]
# Accuracy calculated from CM = 19.3%
At first, I thought that TensorFlow was shuffling testing_data
on each access so I added testing_data.shuffle(len(testing_data), reshuffle_each_iteration=False)
, but still results do not agree.
Have also tried TF confusion matrix:
y_hat = model.predict(testing_data)
y_test = np.concatenate([y for x, y in testing_data], axis=0)
true_class = tf.argmax(y_test, 1)
predicted_class = tf.argmax(y_hat, 1)
cm = tf.math.confusion_matrix(true_class, predicted_class, NUM_CATEGORIES)
print(cm)
...with similar result.
Obviously predicted labels must be compared with the correct labels. What am I doing wrong?
Upvotes: 1
Views: 864
Reputation: 5079
I could not find the source but seems like Tensorflow is still shuffling the testing under the hood. You can try to iterate over the dataset to obtain predictions and real classes:
predicted_classes = np.array([])
true_classes = np.array([])
for x, y in testing_data:
predicted_classes = np.concatenate([predicted_classes,
np.argmax(model(x), axis = -1)])
true_classes = np.concatenate([true_classes, np.argmax(y.numpy(), axis=-1)])
model(x)
is for faster execution. From the source:
Computation is done in batches. This method is designed for performance in large scale inputs. For small amount of inputs that fit in one batch, directly using
__call__
is recommended for faster execution, e.g.,model(x)
If it does not work, you can try model.predict(x)
instead.
Upvotes: 3