val_loss missing in keras logs but correctly printed at the end of epochs

Question

I'm trying to use the ModelCheckpoint callback in keras. However, it keeps saying to me that val_loss is not available. I added a print statement in the code of ModelCheckpoint to check the content of the logs input. You can indeed see that val_loss is not present in the dictionary.

The weird thing is that val_loss is correctly reported at the end of each epoch and it is present in the history object generated by model.fit. Clearly I provide validation data (otherwise val_loss could not be evaluated at the end of each epoch).

...
3/3 - 65s - loss: 0.2053 - **val_loss: 0.1153**
Epoch 2/45
logs={'batch': 0, 'size': 30000, 'loss': 0.20355584}
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
...

Is this a bug or am I missing something?

I'm using Keras version '2.2.4-tf' (called from tf.keras)

user11530462 · Accepted Answer

Adding the solution here, even though it is present in Github, for the benefit of the StackOverflow Community.

The issue was caused by some confusion between keras.callbacks.ModelCheckpoint and tensorflow.keras.callbacks.ModelCheckpoint.

In the first (pure keras), the argument period controls every how many epochs the model is saved. This occurs always at epoch end when also the val_loss is computed and included in logs.

In tensorflow.keras.callbacks.ModelCheckpoint, instead, save_freq controls every how many batches the model is saved. This cause the callback to be evaluated in the middle of an epoch, where val_loss is not available.

Changing save_freq to epoch (the default) has resolved the issue.

val_loss missing in keras logs but correctly printed at the end of epochs

Answers (1)

Related Questions