Faizan Ali
Faizan Ali

Reputation: 1013

How do I know the total number of steps while training using Tensorflow Object Detection API?

I've been running a training job for the last 3 hours on GPU powered cloud machine with the following command:

python legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config

and after running that, the log says this for example:

 INFO:tensorflow:global step 14455: loss = 0.5896 (0.775 sec/step)
I1001 19:27:43.575182 140054916601600 tf_logging.py:116] global step 14455: loss = 0.5896 (0.775 sec/step)

How do I know how many steps are there to be done or how many steps are there in total?

Upvotes: 2

Views: 3231

Answers (2)

Janikan
Janikan

Reputation: 370

In the ssd_mobilenet_v1_pets.config it says in line 163:

num_steps: 200000

This is the number of total steps, the training script will perform if you did not make any changes.

Upvotes: 0

Prune
Prune

Reputation: 77837

If you're using a pre-defined model topology, you look up the training period (in epochs or steps) in the documentation that comes with the model. If you've made your own model, you determine the training period by watching the test results. When the accuracy reaches an acceptable level and then starts to drop, you're likely over-training. Back up to the high point of accuracy. Repeat this experiment a few times to determine the "sweet spot" for your model.

Upvotes: 1

Related Questions