Reputation: 17468
Having trained for 24 hours, the training process saved the model files via torch.save
. There was a power-off or other issues caused the process exited. Normally, we can load the model and continue training from the last step.
Why should not we load the states of optimizers (Adam, etc), is it necessary?
Upvotes: 1
Views: 613
Reputation: 106
Yes, you can load the model from the last step and retrain it from that very step.
if you want to use it only for inference, you will save the state_dict of the model as
torch.save(model, PATH)
And load it as
model = torch.load(PATH)
model.eval()
However, for your concern you need to save the optimizer state dict as well. For that purpose, you need to save it as
torch.save({
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss,
...
}, PATH)
and load the model for further training as:
model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)
checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']
model.eval()
# - or -
model.train()
It is necessary to save the optimizer state dictionary, since this contains buffers and parameters that are updated as the model trains.
Upvotes: 2
Reputation: 4083
It is necessary to load the states of the optimizers in some cases, such as the case that a learning rate scheduler is being used.
In that particular case, learning rate for the optimizer will be re-adjusted to the point where it was at the saved state.
Upvotes: 1