garbage_collector
garbage_collector

Reputation: 103

EarlyStopping not stop training

As the title suggests I am training my IRV2 network using the following EarlyStopping definition:

callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, mode="auto")

However, the training doesn't stop when I get three equal values of val_loss:

history = model.fit(
    X_train_s,
    y_train_categorical,
    steps_per_epoch=steps_per_epoch,
    epochs=epochs,
    batch_size=batch_size,
    validation_data=(X_validation_s, y_validation_categorical),
    callbacks=[callback]
    )

This is my model:

def Inception_Resnet_V2_Binary(x_train, batch_size):

    conv_base = InceptionResNetV2(weights='imagenet', include_top=False, input_shape=image_size, pooling="avg")   

    for layer in conv_base.layers:
        layer.trainable = False

    model = models.Sequential()
    model.add(conv_base)
    model.add(layers.Dense(2, activation='softmax'))

    steps_per_epoch = (len(x_train))/ batch_size
    lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=0.0002,
    decay_steps=steps_per_epoch * 2,
    decay_rate=0.7)

    opt = Adam(learning_rate=lr_schedule)

    model.compile(loss="binary_crossentropy", optimizer=opt, metrics = ["accuracy", tf.metrics.AUC()])

    return model

This is the training output:

Epoch 28/70
151/151 [==============================] - 12s 78ms/step - loss: 0.5149 - accuracy: 0.7398 - auc_1: 0.8339 - val_loss: 0.5217 - val_accuracy: 0.7365 - val_auc_1: 0.8245
Epoch 29/70
151/151 [==============================] - 12s 78ms/step - loss: 0.5127 - accuracy: 0.7441 - auc_1: 0.8354 - val_loss: 0.5216 - val_accuracy: 0.7365 - val_auc_1: 0.8245
Epoch 30/70
151/151 [==============================] - 12s 79ms/step - loss: 0.5144 - accuracy: 0.7384 - auc_1: 0.8321 - val_loss: 0.5216 - val_accuracy: 0.7365 - val_auc_1: 0.8245
Epoch 31/70
151/151 [==============================] - 12s 78ms/step - loss: 0.5152 - accuracy: 0.7402 - auc_1: 0.8332 - val_loss: 0.5216 - val_accuracy: 0.7365 - val_auc_1: 0.8246
Epoch 32/70
151/151 [==============================] - 12s 78ms/step - loss: 0.5143 - accuracy: 0.7410 - auc_1: 0.8347 - val_loss: 0.5216 - val_accuracy: 0.7365 - val_auc_1: 0.8246
Epoch 33/70
151/151 [==============================] - 12s 78ms/step - loss: 0.5124 - accuracy: 0.7404 - auc_1: 0.8352 - val_loss: 0.5216 - val_accuracy: 0.7365 - val_auc_1: 0.8245
Epoch 34/70
151/151 [==============================] - 12s 81ms/step - loss: 0.5106 - accuracy: 0.7441 - auc_1: 0.8363 - val_loss: 0.5216 - val_accuracy: 0.7365 - val_auc_1: 0.8245
Epoch 35/70
151/151 [==============================] - 12s 78ms/step - loss: 0.5129 - accuracy: 0.7389 - auc_1: 0.8342 - val_loss: 0.5215 - val_accuracy: 0.7365 - val_auc_1: 0.8245
Epoch 36/70
151/151 [==============================] - 12s 79ms/step - loss: 0.5122 - accuracy: 0.7400 - auc_1: 0.8341 - val_loss: 0.5215 - val_accuracy: 0.7365 - val_auc_1: 0.8246
Epoch 37/70
151/151 [==============================] - 12s 78ms/step - loss: 0.5160 - accuracy: 0.7424 - auc_1: 0.8346 - val_loss: 0.5215 - val_accuracy: 0.7365 - val_auc_1: 0.8246
Epoch 38/70
151/151 [==============================] - 12s 78ms/step - loss: 0.5175 - accuracy: 0.7367 - auc_1: 0.8318 - val_loss: 0.5215 - val_accuracy: 0.7365 - val_auc_1: 0.8246

Upvotes: 2

Views: 795

Answers (2)

jackve
jackve

Reputation: 319

That happens because the loss is actually decreasing, but by a value that is very low. If you don't set the min_delta in the early stopping callback, the training will consider a negligible improvement as an actual improvement. You can solve the problem by simply adding min_delta argument=0.001:

tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, mode="auto", min_delta=0.001)

You can set the min_delta as you feel suitable. min_delta=0.001 will consider changes in loss that are less than that value as no improvement

Upvotes: 3

BestDogeStackoverflow
BestDogeStackoverflow

Reputation: 1117

That val_loss it's not a float of 4 decimal numbers, you are not seeing the entire value, patience should be used to prevent a network in overfitting to run for hours, if the val_loss keeps lowering itself just let the network run (and from the loss the learning rate seems a bit high).

Val_loss is a float32, those 4 values are only the 4 most significant value, to see what's really going on you can't rely on the fit output, you will need a callback of some sort that prints the val_loss with the format you want.

you can find some examples here:

https://keras.io/guides/writing_your_own_callbacks/

Upvotes: 1

Related Questions