Reputation: 61
I am trying to train a TensorFlow object detection model on a custom dataset on google colab and I have a saved model trained for 5000 steps, is it possible to use saved model to resume training? I am planning to train for another 20000 steps. I am using google colab for training and the training will take around 36 hours, so I'm planning to use checkpoint. How to store best model checkpoints and use them when session runs out?
Upvotes: 4
Views: 6322
Reputation: 2385
For resuming training using weights from a saved checkpoint, in your pipeline.config
file, change the line containing fine_tune_checkpoint
from <path_to_ckpt>/model.ckpt
to <path_to_ckpt>/model.ckpt-XXXX
where XXXX is your checkpoint number.
As far as saving only best weights is concerned, you can refer to this post and/or this GitHub link
Upvotes: 2