Tensorflow Checkpoints saving for each step

Question

I am using a Tensorflow object detection for training a two class model. While training the training starts at 0 and proceeds in 100 steps (logs are seen for every 100 steps) and when the step reaches 1000 (by 100, 200, 300, 400, 500....steps) it performs evaluation and I can view the results in tensorboard. After 1000 steps, the checkpoint gets saved for every step like 1001, 1002, 1003,.... and evaluation also happens for every single step. Why does this happen?

Tensorflow version: nvidia-tensorflow 1.15

Training is based on: https://colab.research.google.com/github/google-coral/tutorials/blob/master/retrain_ssdlite_mobiledet_qat_tf1.ipynb

Vignesh Kathirkamar · Accepted Answer

I found a fix, but don't understand it in depth.

In the python file "run_config.py" present under "python3.6/site-packages/tensorflow_estimator/python/estimator/run_config.py" there was a variable named, "save_checkpoints_steps" which was assigned a value "_USE_DEFAULT", after changing it to 1000, there was no problem and checkpoints were saving only for every 1000 checkpoints.

Still I don't know why "_USE_DEFAULT" was saving checkpoints for every single step

Tensorflow Checkpoints saving for each step

Answers (2)

Related Questions