Reputation: 33
If I'm giving shuffle=False while creating a test or validation dataset,
test_dataset = test_image_gen.flow_from_directory(test_path,
target_size=(125,125),
batch_size=batch_size,
class_mode='binary',
shuffle=False)
while making predictions using predict_generator, I'm getting better confusion matrix and classification report when shuffle is False.
[[947 53]
[ 25 975]]
precision recall f1-score support
0 0.97 0.95 0.96 1000
1 0.95 0.97 0.96 1000
accuracy 0.96 2000
macro avg 0.96 0.96 0.96 2000
weighted avg 0.96 0.96 0.96 2000
But if I set shuffle=True the results are very disheartening.
test_dataset = test_image_gen.flow_from_directory(test_path,
target_size=(125,125),
batch_size=batch_size,
class_mode='binary',
shuffle=True)
[[495 505]
[477 523]]
precision recall f1-score support
0 0.51 0.49 0.50 1000
1 0.51 0.52 0.52 1000
accuracy 0.51 2000
macro avg 0.51 0.51 0.51 2000
weighted avg 0.51 0.51 0.51 2000
Upvotes: 3
Views: 5323
Reputation: 107
In your case, the problem with setting the shuffle=True is that if you shuffle on your validation set, the results will be chaotic. It happens that the prediction is correct but compared to wrong indices can lead to misleading results, just like it happened in your case.
Always shuffle=True on the training set and shuffle=False on the validation set and test set.
Original answer : accuracy-reduced-when-shuffle-set-to-true-in-keras-fit-generator
Upvotes: 3