Tensorflow object detection API: How to disable loading from checkpoint

Question

I have created a custom variation of MobileNetV2 feature extractor architecture, by changing the expansion_size from 6 to 4 in research/slim/nets/mobilenet/mobilenet_v2.py of tensorflow/models repo.

I want to be able to train the SSD + Mobilenet_v2 (with this change) model with model_main.py script as described at Object Detection API's running_locally tutorial.

When I do so I see the following error, which makes sense:

`InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint.

To address this:

I removed the finetune_checkpoint specification from my pipeline.config.
I changed load_pretrained=True to load_pretrained=False in object_detection/model_hparams.py.
I added --hparams_overrides='load_pretrained=false' as a command line input argument to model_main.py.

Despite of these, I still see the same error.

Why is tensorflow still trying to restore a checkpoint. How can I make it not do so?

bappak · Accepted Answer

Found the solution myself. It turns out that even though I had made arrangements for it to not restore checkpoint from my pipeline configuration file, it turns out that the internal tf.Estimator object can still use a checkpoint from the model_dir specified; even though the primary use of model_dir is as an output directory, where output checkpoints are written to.

I found this information in the official documentation for tf.Estimator. Here's the relevant excerpt for reference:

`model_dir: Directory to save model parameters, graph and etc. This can also be used to load checkpoints from the directory into an estimator to continue training a previously saved model. If PathLike object, the path will be resolved. If None, the model_dir in config will be used if set. If both are set, they must be same. If both are None, a temporary directory will be used.

I had an old checkpoint sitting in my original model_dir which was architecturally incompatible with my custom model. Hence I was seeing the error. To resolve it, I simply changed my model_dir to point to a new empty directory and it worked. I hope that helps someone with a similar problem.

Tensorflow object detection API: How to disable loading from checkpoint

Answers (1)

Related Questions