Tensorflow DataSet Shuffle Impact the validation training accuracy and ambiguous behavior

Question

i am struggling with training a neural network that uses tf.data.DataSet as input.

What I find is that if I call .shuffle() before split the entire dataset in train, val, test set the accuracy on val (in training) and test (in evaluate) is 91%, but when I run .evaluate() on the test set many times the accuracy and loss metrics change every time. The same behavior occurs with .predict() on test set, with the classes that change every time.

This is the output of traning, evaluate end predict process

    total_record: 93166 - trainin_size: 74534 - val_size: 9316 - test_size: 9316
Epoch 1/5
145/145 [==============================] - 42s 273ms/step - loss: 1.7143 - sparse_categorical_accuracy: 0.4051 - val_loss: 1.4997 - val_sparse_categorical_accuracy: 0.4885
Epoch 2/5
145/145 [==============================] - 40s 277ms/step - loss: 0.7571 - sparse_categorical_accuracy: 0.7505 - val_loss: 1.1634 - val_sparse_categorical_accuracy: 0.6050
Epoch 3/5
145/145 [==============================] - 41s 281ms/step - loss: 0.4894 - sparse_categorical_accuracy: 0.8223 - val_loss: 0.7628 - val_sparse_categorical_accuracy: 0.7444
Epoch 4/5
145/145 [==============================] - 38s 258ms/step - loss: 0.3417 - sparse_categorical_accuracy: 0.8656 - val_loss: 0.4236 - val_sparse_categorical_accuracy: 0.8579
Epoch 5/5
145/145 [==============================] - 40s 271ms/step - loss: 0.2660 - sparse_categorical_accuracy: 0.8926 - val_loss: 0.2807 - val_sparse_categorical_accuracy: 0.9105

accr = model.evaluate(test_set)
19/19 [==============================] - 1s 39ms/step - loss: 0.2622 - sparse_categorical_accuracy: 0.9153

accr = model.evaluate(test_set)
19/19 [==============================] - 1s 40ms/step - loss: 0.2649 - sparse_categorical_accuracy: 0.9170

accr = model.evaluate(test_set)
19/19 [==============================] - 1s 40ms/step - loss: 0.2726 - sparse_categorical_accuracy: 0.9141

accr = model.evaluate(test_set)
19/19 [==============================] - 1s 40ms/step - loss: 0.2692 - sparse_categorical_accuracy: 0.9166

pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
Out[41]: array([0, 1, 5, ..., 2, 0, 1])

pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
Out[42]: array([2, 3, 1, ..., 1, 2, 0])

pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
Out[43]: array([1, 2, 4, ..., 1, 3, 0])

pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
Out[44]: array([0, 3, 1, ..., 0, 5, 4])

So, I tried to apply .shuffle() after the split and only on the training and validation (commenting the main .shuffle() and uncommenting the shuffle in train_set and val_set).

But in this case, I find that the network goes into overfitting after just 5 epochs (with the previous training process callbacks block the training at 30° epochs with 94% val accuracy), with an accuracy of 75% since 2° epoch on validation set.

However, in this case if I run .evaluate() and .predict() on the testset to which .shuffle () has not been applied, the metrics and classes remain unchanged on each call.

Why this behavior? But especially what is the great way and what is the real accuracy of model?

Thank's

This is the code of the process

""" ### Make tf.data.Dataset """

dataset = tf.data.Dataset.from_tensor_slices(({ "features_emb_subj": features_emb_subj,
                                            "features_emb_snip": features_emb_snip,
                                            "features_emb_fromcat": features_emb_fromcat,
                                            "features_dense": features_dense,
                                            "features_emb_user": features_emb_user}, cat_labels))

dataset = dataset.shuffle(int(len(features_dense)), reshuffle_each_iteration=True)


""" ### Split in train,val,test """

train_size = int(0.8 * len(features_dense))
val_size = int(0.10 * len(features_dense))
test_size = int(0.10 * len(features_dense))

test_set = dataset.take(test_size)
validation_set = dataset.skip(test_size).take(val_size)
training_set = dataset.skip(test_size + val_size)

test_set = test_set.batch(BATCH_SIZE, drop_remainder=False)
#validation_set = validation_set.shuffle(val_size, reshuffle_each_iteration=True)
validation_set = validation_set.batch(BATCH_SIZE, drop_remainder=False)
#training_set = training_set.shuffle(train_size, reshuffle_each_iteration=True)
training_set = training_set.batch(BATCH_SIZE, drop_remainder=True)


"""### Train model """

callbacks = [EarlyStopping(monitor='val_loss', patience=3, min_delta=0.0001, restore_best_weights=True)]

history = model.fit(  training_set,
                      epochs = 5,
                      validation_data = validation_set,
                      callbacks=callbacks,
                      class_weight = setClassWeight(cat_labels),
                      verbose = 1)

"""### Evaluate model """
accr = model.evaluate(test_set)

"""### Predict test_test """
pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class

Tensorflow DataSet Shuffle Impact the validation training accuracy and ambiguous behavior

Answers (1)

Related Questions