Is it possible in TF/Keras to save the best model AFTER X epochs?

Question

My models run really fast but they seem to slow down because I'm saving the best model (to load in another process); but I'm noticing the saving process itself slows down the processing. As in the early stages of the fitting each iteration is improving it's adding more and more latency.

I wonder if there is a way to save the best model AFTER X epochs or save it in the background so the model training isn't delayed by saving too often?

For clarity, this is how I'm running ModelCheckpoint in Keras/TF2:

filepath="BestModel.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

today · Accepted Answer

You can use save_freq argument of ModelCheckpoint callback to control the frequency of saving. By default, it is set to 'epoch' which means it would save the model at the end of each epoch; however, it also could be set to an integer which determines the number of batches to pass to save the model. Here is the relevant part of documentation for reference:

save_freq: 'epoch' or integer. When using 'epoch', the callback saves the model after each epoch. When using integer, the callback saves the model at end of this many batches. If the Model is compiled with experimental_steps_per_execution=N, then the saving criteria will be checked every Nth batch. Note that if the saving isn't aligned to epochs, the monitored metric may potentially be less reliable (it could reflect as little as 1 batch, since the metrics get reset every epoch). Defaults to 'epoch'.

Is it possible in TF/Keras to save the best model AFTER X epochs?

Answers (1)

Related Questions