Saania
Saania

Reputation: 625

Queries regarding checkpoints of Object Detection API

I have a few queries regarding the Tensorflow Object Detection API.

  1. While training, only the previous 5 check-points are stored. I want to store more than that, say the previous 10 check points. How can this be done? (I think it should be one of the parameters of train.proto in object_detection/protos.)

  2. By default, the check points are stored every 10 minutes (600 seconds). To change this frequency, I believe it is one of these two parameters that have to be changed, please confirm which one it is:

    from learning.py in /home/user/tensorflow-gpu/lib/python3.5/site-packages/tensorflow/contrib/slim/python/slim

    save_summaries_secs=600 or

    save_interval_secs=600

  3. While training my model (ssd_mobilenet_v2_coco_2018_03_29), I also run the evaluation simultaneously. The latest checkpoint represented in the eval graph always lags the latest one saved in object_detection/training folder. For example, in the case below, the latest checkpoint shown on graph is 29.437k, while the model is already trained till the checkpoint 32.891k (and saved in the training folder). What is the reason for this lag (20 minutes lag) Why isn't one step (10 minutes) enough to perform evaluation on the trained model?

Upvotes: 0

Views: 1544

Answers (2)

junbong jang
junbong jang

Reputation: 11

This is for anyone who wants to configure the updated object detection API that supports TensorFlow 2

  1. To save the previous 10 checkpoints, open model_lib.py and pass keyword argument max_to_keep=10 to every tf.train.Saver function
  2. To change the frequency from 600 seconds to 3600 seconds (1 hour), open model_main.py and find the line that contains tf.estimator.RunConfig in the main function.
    Pass the keyword argument save_checkpoints_secs=3600 to the tf.estimator.RunConfig class.

Here is the code snippet after configuring checkpoint save frequency in model_main.py:

def main(unused_argv):
      flags.mark_flag_as_required('model_dir')   
      flags.mark_flag_as_required('pipeline_config_path')   
      config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir, save_checkpoints_secs=3600)

please note that there is a parameter keep_checkpoint_max in the tf.estimator.RunConfig class but setting it didn't affect the number of saved checkpoints for me.

Upvotes: 1

Srinivas Bringu
Srinivas Bringu

Reputation: 452

This post here should work i believe to change keep_checkpoint_every_n_hours max_to_keep

How to store best models checkpoints, not only newest 5, in Tensorflow Object Detection API?

You can also refer official doc https://www.tensorflow.org/api_docs/python/tf/train/Saver

Upvotes: 0

Related Questions