Reputation: 103
I am attempting to fine-tune a pre-trained Mask R-CNN Inception ResNet V2 1024x1024 model using the TensorFlow Object Detection API for a custom task. I have downloaded the model from this location.
I have created a pipeline configuration for this model, specifying my training and evaluation TFRecord datasets and the path to the downloaded checkpoint as the fine_tune_checkpoint.
However, when I run the model_main_tf2.py script to initiate the training, I encounter an error stating that some variables from the checkpoint are missing in the model. The error is as follows:
Traceback (most recent call last): File "/content/models/research/object_detection/model_main_tf2.py", line 114, in <module> tf.compat.v1.app.run() File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/platform/app.py", line 36, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 308, in run _run_main(main, args) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/content/models/research/object_detection/model_main_tf2.py", line 105, in main model_lib_v2.train_loop( File "/usr/local/lib/python3.10/dist-packages/object_detection/model_lib_v2.py", line 605, in train_loop load_fine_tune_checkpoint( File "/usr/local/lib/python3.10/dist-packages/object_detection/model_lib_v2.py", line 398, in load_fine_tune_checkpoint raise ValueError('Checkpoint version should be V2') ValueError: Checkpoint version should be V2
This error suggests that there is a mismatch between the model architecture defined in my pipeline and the architecture of the pre-trained model. However, as far as I can see, my pipeline configuration is correctly set up for the Mask R-CNN Inception ResNet V2 1024x1024 model.
Furthermore, I have inspected the checkpoint file using the inspect_checkpoint.py script and it seems to include all the variables expected for this model. The downloaded checkpoint files include ckpt-0.index, ckpt-0.data-00000-of-00001, and checkpoint.
I am running this on Google Colab with TensorFlow version 2.12.0 and Python version 3.10.0. I would greatly appreciate any guidance or solutions to this problem.
Steps to reproduce the behavior:
I expect the model training to begin by loading weights from the specified pre-trained model. The error seems to suggest a mismatch between the model architecture defined in my pipeline and the architecture of the pre-trained model. Still, my pipeline configuration appears to be correctly set up for the Mask R-CNN Inception ResNet V2 1024x1024 model.
Upon inspecting the checkpoint file with inspect_checkpoint.py, it does appear to contain all the expected variables for a Mask R-CNN Inception ResNet V2 1024x1024 model. I also confirmed that the downloaded files include ckpt-0.index, ckpt-0.data-00000-of-00001, and checkpoint. Yet, the issue persists. Any guidance or solutions to this problem would be greatly appreciated.
I have attached my pipeline.config file below:
pipeline.txt
Upvotes: 0
Views: 297
Reputation: 1107
Add this following flag to train_config: section below your fine_tune_checkpoint: entry
fine_tune_checkpoint_version: V2 fine_tune_checkpoint_type: "detection"
Also note there seems to be an open issue with this model -
https://github.com/tensorflow/models/issues/9546
Upvotes: 1