Reputation: 4379
I trained on TensorFlow model on a GPU cluster, saved the model using
saver = tf.train.Saver()
saver.save(sess, config.model_file, global_step=global_step)
and now I am trying to restore the model with
saver = tf.train.import_meta_graph('model-1000.meta')
saver.restore(sess,tf.train.latest_checkpoint(save_path))
for evaluation, on a different system. The issue is that saver.restore
yields the following error:
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1664, in <module>
main()
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/jonpdeaton/Developer/BraTS18-Project/segmentation/evaluate.py", line 205, in <module>
main()
File "/Users/jonpdeaton/Developer/BraTS18-Project/segmentation/evaluate.py", line 162, in main
restore_and_evaluate(save_path, model_file, output_dir)
File "/Users/jonpdeaton/Developer/BraTS18-Project/segmentation/evaluate.py", line 127, in restore_and_evaluate
saver.restore(sess, tf.train.latest_checkpoint(save_path))
File "/Users/jonpdeaton/anaconda3/envs/BraTS/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1857, in latest_checkpoint
if file_io.get_matching_files(v2_path) or file_io.get_matching_files(
File "/Users/jonpdeaton/anaconda3/envs/BraTS/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 337, in get_matching_files
for single_filename in filename
File "/Users/jonpdeaton/anaconda3/envs/BraTS/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: /afs/cs.stanford.edu/u/jdeaton/dfs/unet; No such file or directory
It seems as though there are some paths that were stored in the model or checkpoint
file form the system that it was trained on, that are no longer valid on the system that I am doing evaluation on. How do I restore a model (for evaluation) on a different machine after having copied the model-X.meta
, model-X.index
and checkpoint
files?
Upvotes: 0
Views: 513
Reputation: 7844
By default, the Saver
object will write the absolute model checkpoint paths into the checkpoint
file. So the path returned by tf.train.latest_checkpoint(save_path)
is the absolute path on your old machine.
Temporary solution:
restore
method rather than the result of tf.train.latest_checkpoint
.checkpoint
file, which is a simple text file.Long term solution:
saver = tf.train.Saver(save_relative_paths=True)
Upvotes: 1
Reputation: 4379
Open up the checkpoint file with your favorite text editor and simply change the absolute paths found therein to just filenames.
Upvotes: 0