Reputation: 2478
I am running a code using Python 3.7.5 with TensorFlow 2.0 for MNIST classification. I am using EarlyStopping from TensorFlow 2.0 and the callback I have for it is:
callbacks = [
tf.keras.callbacks.EarlyStopping(
monitor='val_loss', patience = 3,
min_delta=0.001
)
]
According to EarlyStopping - TensorFlow 2.0 page, the definition of min_delta parameter is as follows:
min_delta: Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.
Train on 60000 samples, validate on 10000 samples
Epoch 1/15 60000/60000 [==============================] - 10s 173us/sample - loss: 0.2040 - accuracy: 0.9391 - val_loss: 0.1117 - val_accuracy: 0.9648
Epoch 2/15 60000/60000 [==============================] - 9s 150us/sample - loss: 0.0845 - accuracy: 0.9736 - val_loss: 0.0801 - val_accuracy: 0.9748
Epoch 3/15 60000/60000 [==============================] - 9s 151us/sample - loss: 0.0574 - accuracy: 0.9817 - val_loss: 0.0709 - val_accuracy: 0.9795
Epoch 4/15 60000/60000 [==============================] - 9s 149us/sample - loss: 0.0434 - accuracy: 0.9858 - val_loss: 0.0787 - val_accuracy: 0.9761
Epoch 5/15 60000/60000 [==============================] - 9s 151us/sample - loss: 0.0331 - accuracy: 0.9893 - val_loss: 0.0644 - val_accuracy: 0.9808
Epoch 6/15 60000/60000 [==============================] - 9s 150us/sample - loss: 0.0275 - accuracy: 0.9910 - val_loss: 0.0873 - val_accuracy: 0.9779
Epoch 7/15 60000/60000 [==============================] - 9s 151us/sample - loss: 0.0232 - accuracy: 0.9921 - val_loss: 0.0746 - val_accuracy: 0.9805
Epoch 8/15 60000/60000 [==============================] - 9s 151us/sample - loss: 0.0188 - accuracy: 0.9936 - val_loss: 0.1088 - val_accuracy: 0.9748
Now if I look at the last three epochs viz., epochs 6, 7, and 8 and look at the validation loss ('val_loss'), their values are:
0.0688, 0.0843 and 0.0847.
And the differences between consecutive 3 terms are: 0.0155, 0.0004. But isn't the first difference greater than 'min_delta' as defined in the callback.
The code I came up with for EarlyStopping is as follows:
# numpy array to hold last 'patience = 3' values-
pv = [0.0688, 0.0843, 0.0847]
# numpy array to compute differences between consecutive elements in 'pv'-
differences = np.diff(pv, n=1)
differences
# array([0.0155, 0.0004])
# minimum change required for monitored metric's improvement-
min_delta = 0.001
# Check whether the consecutive differences is greater than 'min_delta' parameter-
check = differences > min_delta
check
# array([ True, False])
# Condition to see whether all 3 'val_loss' differences are less than 'min_delta'
# for training to stop since EarlyStopping is called-
if np.all(check == False):
print("Stop Training - EarlyStopping is called")
# stop training
But according to the 'val_loss', the differences between the not ALL of the 3 last epochs are greater than 'min_delta' of 0.001. For example, the first difference is greater than 0.001 (0.0843 - 0.0688) while the second difference is less than 0.001 (0.0847 - 0.0843).
Also, according to patience parameter definition of "EarlyStopping":
patience: Number of epochs with no improvement after which training will be stopped.
So, EarlyStopping should be called when there is no improvement for 'val_loss' for 3 consecutive epochs where the absolute change of less than 'min_delta' does not count as improvement.
Then why is EarlyStopping called?
Code for model definition and 'fit()' are:
import tensorflow_model_optimization as tfmot
from tensorflow_model_optimization.sparsity import keras as sparsity
import matplotlib.pyplot as plt from tensorflow.keras.layers import AveragePooling2D, Conv2D
from tensorflow.keras import models, layers, datasets
from tensorflow.keras.layers import Dense, Flatten, Reshape, Input, InputLayer
from tensorflow.keras.models import Sequential, Model
# Specify the parameters to be used for layer-wise pruning, NO PRUNING is done here:
pruning_params_unpruned = {
'pruning_schedule': sparsity.PolynomialDecay(
initial_sparsity=0.0, final_sparsity=0.0,
begin_step = 0, end_step=0, frequency=100) }
def pruned_nn(pruning_params):
"""
Function to define the architecture of a neural network model
following 300 100 architecture for MNIST dataset and using
provided parameter which are used to prune the model.
Input: 'pruning_params' Python 3 dictionary containing parameters which are used for pruning
Output: Returns designed and compiled neural network model
"""
pruned_model = Sequential()
pruned_model.add(l.InputLayer(input_shape=(784, )))
pruned_model.add(Flatten())
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = 300, activation='relu', kernel_initializer=tf.initializers.GlorotUniform()),
**pruning_params))
# pruned_model.add(l.Dropout(0.2))
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = 100, activation='relu', kernel_initializer=tf.initializers.GlorotUniform()),
**pruning_params))
# pruned_model.add(l.Dropout(0.1))
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = num_classes, activation='softmax'),
**pruning_params))
# Compile pruned CNN-
pruned_model.compile(
loss=tf.keras.losses.categorical_crossentropy,
# optimizer='adam',
optimizer=tf.keras.optimizers.Adam(lr = 0.001),
metrics=['accuracy'])
return pruned_model
batch_size = 32
epochs = 50
# Instantiate NN-
orig_model = pruned_nn(pruning_params_unpruned)
# Train unpruned Neural Network-
history_orig = orig_model.fit(
x = X_train, y = y_train,
batch_size = batch_size,
epochs = epochs,
verbose = 1,
callbacks = callbacks,
validation_data = (X_test, y_test),
shuffle = True )
Upvotes: 3
Views: 2430
Reputation: 2718
The behaviour of the Early Stopping callback is related to:
min_delta
which is the minimum quantity to be considered an improvement, between the performance of the monitored quantity in the current epoch and the best result in that metric.patience
which is the number of epochs
without improvements (taking into consideration that improvements have to be of a greater change than min_delta
) before stopping the algorithm.In your case, the best val_loss
is 0.0644
and should have a value of lower than 0.0634
to be registered as improvement:
Epoch 6/15 val_loss: 0.0873 | Difference is: + 0.0229
Epoch 7/15 val_loss: 0.0746 | Difference is: + 0.0102
Epoch 8/15 val_loss: 0.1088 | Difference is: + 0.0444
Be aware that the quantities that are printed in the "training log", are approximated and you shouldn't base your assumptions on these values. You should rather take into consideration the true values from callbacks or the history of the training.
Upvotes: 2