Reputation: 11
Restoring tensorflow model using saver.restore(sess,model_dir) is failing in Google Colaboratory.
Restore Code
tf.reset_default_graph()
sq_net = classifierNet(input_shape,out_classes,lr_rate,is_train)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
train_vars = tf.trainable_variables()
if model_dir is not None:
if os.path.exists("{}.index".format(model_dir)):
saver = tf.train.Saver()
saver.restore(sess, model_dir)
print("Model at %s restored" % model_dir)
else:
print("Model path does not exist, skipping...")
else:
print("Model path is None - Nothing to restore")
The above code produces the following error:
INFO:tensorflow:Restoring parameters from dri//colab//mod//
---------------------------------------------------------------------------
NotFoundError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1360 def _register_dead_handle(self, handle):
-> 1361 # Register a dead handle in the session. Delete the dead tensors when
1362 # the number of dead tensors exceeds certain threshold.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1339 # Nothing to do if we're using the new session interface
-> 1340 # TODO(skyewm): remove this function altogether eventually
1341 if self._created_with_new_api: return
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for dri//colab//mod//model.ckpt
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
During handling of the above exception, another exception occurred:
NotFoundError Traceback (most recent call last)
<ipython-input-44-fb31e76f6b17> in <module>()
11 if os.path.exists("{}.index".format(model_dir)):
12 saver = tf.train.Saver(var_list=v0_vars)
---> 13 saver.restore(sess, model_dir)
14 print("Model at %s restored" % model_dir)
15 else:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
1753 # Create a saver.
1754 saver = tf.train.Saver(...variables...)
-> 1755 # Remember the training_op we want to run by adding it to a collection.
1756 tf.add_to_collection('train_op', train_op)
1757 sess = tf.Session()
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
903 This is EXPERIMENTAL and subject to change.
904
--> 905 To use partial execution, a user first calls `partial_run_setup()` and
906 then a sequence of `partial_run()`. `partial_run_setup` specifies the
907 list of feeds and fetches that will be used in the subsequent
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1135 convertible to an ndarray) with matching element type and shape. See
1136 @{tf.Session.run} for details of the allowable feed key and value types.
-> 1137
1138 The returned callable will have the same return type as
1139 `tf.Session.run(fetches, ...)`. For example, if `fetches` is a `tf.Tensor`,
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1353 tf_session.TF_ExtendGraph(
1354 self._session, graph_def.SerializeToString(), status)
-> 1355 self._opened = True
1356
1357 # The threshold to run garbage collection to delete dead tensors.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1372 fetches = []
1373 for deleter_key, tensor_handle in enumerate(tensors_to_delete):
-> 1374 holder, deleter = session_ops._get_handle_deleter(self.graph,
1375 deleter_key,
1376 tensor_handle)
NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for dri//colab//mod//model.ckpt
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
Caused by op 'save/RestoreV2', defined at:
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelapp.py", line 477, in start
ioloop.IOLoop.instance().start()
File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 888, in start
handler_func(fd_obj, events)
File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
handler(stream, idents, msg)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/zmqshell.py", line 533, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-44-fb31e76f6b17>", line 12, in <module>
saver = tf.train.Saver(var_list=v0_vars)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1293, in __init__
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1302, in build
"""
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1339, in _build
"""Deletes old checkpoints if necessary.
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 796, in _build_internal
RuntimeError: If the SAVERS collection already has more than one items.
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 449, in _AddRestoreOps
filename_tensor: Tensor for the path of the file to load.
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 847, in bulk_restore
if all_model_checkpoint_paths is None:
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1030, in restore_v2
shape_and_slices = _ops.convert_to_tensor(shape_and_slices, _dtypes.string)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
# Just being a bit paranoid here
NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for dri//colab//mod//model.ckpt
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
Mounting Drive Code:
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
# Generate auth tokens for Colab
from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
!mkdir -p dri
!google-drive-ocamlfuse dri
After training my model on Google Colab, I want to save it and then restore it for testing purposes. I am able to save the model but the restore function throws the above error in Google Colab (This is working on my local machine though). Please suggest the correct way to do this.
Thanks in advance!
Upvotes: 1
Views: 2398
Reputation: 11
Download the model files into colab using the file id. Follow this - https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb.
Upvotes: 0