Reputation: 675
ModelCheckpoint
works great when I decide to train new model and saves checkpoints as I would like them to be saved. However when I decide to train same model for n
more epochs my problem arises. The thing is epochs get reset to 0, which will produce some model checkpoint names as follows:
/checkpoints
checkpoint-01-0.24.h5
checkpoint-02-0.34.h5
checkpoint-03-0.37.h5
.
.
checkpoint-m-0.68.h5
checkpoint-01-0.71.h5
checkpoint-02-0.73.h5
checkpoint-03-0.74.h5
.
.
checkpoint-n-0.85.h5
Where as you can see epochs will get reset. What I would like to achieve is to get number of all epochs in previous iterations and add it new epochs to get something like this:
checkpoint-(m + 01)-0.71.h5
checkpoint-(m + 02)-0.73.h5
checkpoint-(m + 03)-0.74.h5
.
.
checkpoint-(m + n)-0.85.h5
Upvotes: 1
Views: 733
Reputation: 11553
As you can read in the doc of the .fit()
function, there is a parameter that does exactly that :
initial_epoch: epoch at which to start training (useful for resuming a previous training run)
so just add :
model.fit(..., initial_epoch=m)
where as in your example, m is the first epoch to be running.
I hope this helps :)
Upvotes: 2