Divyansh Shah
Divyansh Shah

Reputation: 33

Why is making shuffle=False on validation set giving better results in confusion matrix and classification report than shuffle=True?

If I'm giving shuffle=False while creating a test or validation dataset,

test_dataset = test_image_gen.flow_from_directory(test_path,
                                          target_size=(125,125),
                                          batch_size=batch_size,
                                          class_mode='binary',
                                          shuffle=False)

while making predictions using predict_generator, I'm getting better confusion matrix and classification report when shuffle is False.

[[947  53]
 [ 25 975]]



    precision    recall  f1-score   support

           0       0.97      0.95      0.96      1000
           1       0.95      0.97      0.96      1000

    accuracy                           0.96      2000
   macro avg       0.96      0.96      0.96      2000
weighted avg       0.96      0.96      0.96      2000

But if I set shuffle=True the results are very disheartening.

test_dataset = test_image_gen.flow_from_directory(test_path,
                                          target_size=(125,125),
                                          batch_size=batch_size,
                                          class_mode='binary',
                                          shuffle=True)
[[495 505]
 [477 523]]



    precision    recall  f1-score   support

           0       0.51      0.49      0.50      1000
           1       0.51      0.52      0.52      1000

    accuracy                           0.51      2000
   macro avg       0.51      0.51      0.51      2000
weighted avg       0.51      0.51      0.51      2000   

Upvotes: 3

Views: 5323

Answers (1)

Kartik Sikka
Kartik Sikka

Reputation: 107

In your case, the problem with setting the shuffle=True is that if you shuffle on your validation set, the results will be chaotic. It happens that the prediction is correct but compared to wrong indices can lead to misleading results, just like it happened in your case.

Always shuffle=True on the training set and shuffle=False on the validation set and test set.

Original answer : accuracy-reduced-when-shuffle-set-to-true-in-keras-fit-generator

Upvotes: 3

Related Questions