GoingMyWay
GoingMyWay

Reputation: 17468

Question on restoring training after loading model

Having trained for 24 hours, the training process saved the model files via torch.save. There was a power-off or other issues caused the process exited. Normally, we can load the model and continue training from the last step.

Why should not we load the states of optimizers (Adam, etc), is it necessary?

Upvotes: 1

Views: 613

Answers (2)

Sagnik Mukherjee
Sagnik Mukherjee

Reputation: 106

Yes, you can load the model from the last step and retrain it from that very step.

if you want to use it only for inference, you will save the state_dict of the model as

torch.save(model, PATH)

And load it as

model = torch.load(PATH)
model.eval()

However, for your concern you need to save the optimizer state dict as well. For that purpose, you need to save it as

torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': loss,
            ...
            }, PATH)

and load the model for further training as:

model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# - or -
model.train()

It is necessary to save the optimizer state dictionary, since this contains buffers and parameters that are updated as the model trains.

Upvotes: 2

Bedir Yilmaz
Bedir Yilmaz

Reputation: 4083

It is necessary to load the states of the optimizers in some cases, such as the case that a learning rate scheduler is being used.

In that particular case, learning rate for the optimizer will be re-adjusted to the point where it was at the saved state.

Upvotes: 1

Related Questions