Does Keras ModelCheckpoint save the best model across multiple fitting sessions?

If I have a Keras model fitted with the ModelCheckpoint callback and fit it in several 'fitting sessions' (i.e. I call model.fit() multiple times), will the callback save the best model in the most recent fitting session or the best model out of all fitting sessions?

Thanks.

Upvotes: 5

Answers (4)

Denziloe

Reputation: 8131

Yes, a checkpoint will only be saved if the performance is better than over all calls to fit. In other words, if none of your epochs in the latest call to fit had better performance than an epoch in a previous call to fit, that previous checkpoint won't be overwritten.

There is one proviso: you must remember to create the callback outside of the call to fit. That is, do this:

checkpoint_callback = keras.callbacks.ModelCheckpoint(
    "checkpoint.h5", save_best_only=True)

model.fit(..., callbacks=checkpoint_callback)
...
model.fit(..., callbacks=checkpoint_callback)

not this:

model.fit(..., callbacks=keras.callbacks.ModelCheckpoint(
                   "checkpoint.h5", save_best_only=True))
...
model.fit(..., callbacks=keras.callbacks.ModelCheckpoint(
                   "checkpoint.h5", save_best_only=True))

The checkpoint callback object has a best attribute which stores the best monitored value so far (and is initially set to the worst possible value, e.g. infinity if lower is good). This is not reset when the object is passed to fit. However, if you instantiate a new callback object within the call to fit, as in the latter code, naturally best will be initialised to the worst possible value, not the best monitored value stored by other callback objects in previous calls to fit.

Upvotes: 0

Gerry P

Reputation: 8092

Good question. I did an experiment with an existing model and data set. I created a checkpoint callback as shown and used it in model.fit

file_path1=r'c:\temp\file1'
mchk=tf.keras.callbacks.ModelCheckpoint( filepath=file_path1,  monitor="val_loss", verbose=1,
    save_best_only=True, save_weights_only=True, mode="auto", save_freq="epoch" )

history = model.fit(X_train, Y_train, validation_data=val_data,
                     batch_size= 128, epochs= 5,  verbose= 1, callbacks=[mchk])

I saved the weights only and saved only the weights for the epoch with the lowest validation loss. I set verbose=1 in the callback so I could see the values of the validation loss on each epoch. Next I ran essentially the same code again but I changed the name of the filepath to file2. Code for that is below

file_path2=r'c:\temp\file2'
mchk=tf.keras.callbacks.ModelCheckpoint( filepath=file_path2,  monitor="val_loss", verbose=1,
    save_best_only=True, save_weights_only=True, mode="auto", save_freq="epoch" )

history = model.fit(X_train, Y_train, validation_data=val_data,
                     batch_size= 128, epochs= 5,  verbose= 1, callbacks=[mchk])

Now model.fit preserves its state at the end of a session so if you run it a second time it starts from where it left off. However it does not preserve the state of the callback. So on the second run the callback initializes the validation loss as np.inf so it will save the weights at the end of the first epoch for sure. If you don't change the name of the file it will over write the file you saved due to the first run. If in the second run the value of the validation loss for which the weights were saved is LOWER than the validation loss of the first run then you wind up with the best saved weights overall. However if in the second run the validation loss is higher than in the first run you end up not saving the OVERALL best weights. So that's how it works for the case where the the callback has save_weights_only=True. I thought it might behave differently if you save the entire model because it may in that case preserve the state of the callback. So I reran the experiment with save_weights_only=False. The results indicate saving the entire model does not save the state of the callback. Now I am using Tensorflow 2.0. The results may be different for different versions. I would run this experiment on your version and see if it behaves similarly.

Upvotes: 2

yudhiesh

Reputation: 6799

It would save the model for the last fit() as you are essentially overwriting the same file.

If you wanted to find the best model over N iterations you should save them with a prefix N in the file name. This way it will save the best model for a particular fit() and you can easily compare them later. You could just manually add in the N i.e., 1,2,3,N for each fit().

// Example
ModelCheckpoint(
        '/home/jupyter/checkpoint/best_model_{N}.h5',
        monitor="val_loss",
        save_best_only=True,
        save_weights_only=False,
        mode="min")

Upvotes: 0

Minh Vũ Hoàng

Reputation: 120

It will save the best model in the most recent fitting session

Upvotes: 1

Does Keras ModelCheckpoint save the best model across multiple fitting sessions?

Answers (4)

Related Questions