Keras correctly saving checkpoints after extra epochs continuing training - initial epoch

Question

ModelCheckpoint works great when I decide to train new model and saves checkpoints as I would like them to be saved. However when I decide to train same model for n more epochs my problem arises. The thing is epochs get reset to 0, which will produce some model checkpoint names as follows:

/checkpoints
    checkpoint-01-0.24.h5
    checkpoint-02-0.34.h5
    checkpoint-03-0.37.h5
              .
              .
    checkpoint-m-0.68.h5
    checkpoint-01-0.71.h5
    checkpoint-02-0.73.h5
    checkpoint-03-0.74.h5
              .
              .
    checkpoint-n-0.85.h5

Where as you can see epochs will get reset. What I would like to achieve is to get number of all epochs in previous iterations and add it new epochs to get something like this:

    checkpoint-(m + 01)-0.71.h5
    checkpoint-(m + 02)-0.73.h5
    checkpoint-(m + 03)-0.74.h5
              .
              .
    checkpoint-(m + n)-0.85.h5

Nassim Ben · Accepted Answer

As you can read in the doc of the .fit() function, there is a parameter that does exactly that :

initial_epoch: epoch at which to start training (useful for resuming a previous training run)

so just add :

model.fit(..., initial_epoch=m)

where as in your example, m is the first epoch to be running.

I hope this helps :)

Keras correctly saving checkpoints after extra epochs continuing training - initial epoch

Answers (1)

Related Questions