Reputation: 1044
I have created a custom variation of MobileNetV2 feature extractor architecture, by changing the expansion_size
from 6 to 4 in research/slim/nets/mobilenet/mobilenet_v2.py
of tensorflow/models
repo.
I want to be able to train the SSD + Mobilenet_v2 (with this change) model with model_main.py
script as described at Object Detection API's running_locally tutorial.
When I do so I see the following error, which makes sense:
`InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint.
To address this:
finetune_checkpoint
specification from my pipeline.config
.load_pretrained=True
to load_pretrained=False
in object_detection/model_hparams.py
.--hparams_overrides='load_pretrained=false'
as a command line input argument to model_main.py
.Despite of these, I still see the same error.
Why is tensorflow still trying to restore a checkpoint. How can I make it not do so?
Upvotes: 1
Views: 1971
Reputation: 1044
Found the solution myself. It turns out that even though I had made arrangements for it to not restore checkpoint from my pipeline configuration file, it turns out that the internal tf.Estimator
object can still use a checkpoint from the model_dir
specified; even though the primary use of model_dir
is as an output directory, where output checkpoints are written to.
I found this information in the official documentation for tf.Estimator. Here's the relevant excerpt for reference:
`model_dir: Directory to save model parameters, graph and etc. This can also be used to load checkpoints from the directory into an estimator to continue training a previously saved model. If PathLike object, the path will be resolved. If None, the model_dir in config will be used if set. If both are set, they must be same. If both are None, a temporary directory will be used.
I had an old checkpoint sitting in my original model_dir
which was architecturally incompatible with my custom model. Hence I was seeing the error. To resolve it, I simply changed my model_dir
to point to a new empty directory and it worked. I hope that helps someone with a similar problem.
Upvotes: 3