roishik
roishik

Reputation: 515

NotFoundError in TensorFlow when restoring a model

I built and saved a TensorFlow model, and then I'm trying to restore this model and use it. I'm using old methods, due to the fact that this code has been written in older versions of tensorflow (now I'm using python 3.5 and tensorflow 1.8.0).

This is the piece of code where I'm saving the model:

sess = tf.InteractiveSession()
..>
#build the computational graph and all the layers. for example, the 1st layer:
W_conv1 = weight_variable([first_conv_kernel_size, first_conv_kernel_size, 1, first_conv_output_channels]) # 5x5 patch, 1 input channel, 32 output channels (features)
b_conv1 = bias_variable([first_conv_output_channels])
x_image = tf.reshape(x, [-1,patch_size,patch_size,1]) # reshape x to a 4d tensor. 2,3 are the image dimensions, 4 is ine color channel
..<
sess.run(tf.initialize_all_variables())
..>
#some more code    
..<
# saving the model:
saver = tf.train.Saver()
save_path = saver.save(sess, main_code_folder + 'code_files/Tensor_Flow/version1/built_networks/10 - testing_the_train_function/model.ckpt')

And this is how I'm restoring the model:

# initial parameters + build layers for tensorboard visualisation. for example, layer 1:
with tf.name_scope('conv_layer1'):
    # build the first layer
    with tf.name_scope('weights'):
        W_conv1 = weight_variable([first_conv_kernel_size, first_conv_kernel_size, 1, first_conv_output_channels]) # 5x5 patch, 1 input channel, 32 output channels (features)
        variable_summaries(W_conv1)
    with tf.name_scope('biases'):
        b_conv1 = bias_variable([first_conv_output_channels])
        variable_summaries(b_conv1)

    x_image = tf.reshape(x, [-1, patch_size, patch_size, 1]) # reshape x to a 4d tensor. 2,3 are the image dimensions, 4 is ine color channel

    with tf.name_scope('Wx_plus_b'):
        Wx_plus_b=conv2d(x_image, W_conv1) + b_conv1
        variable_summaries(Wx_plus_b)

    # apply the layers
    h_conv1 = tf.nn.relu(Wx_plus_b)
...
saver = tf.train.Saver()
savepath = make_folder_name_Win_format(main_code_folder + 'code_files/Tensor_Flow/version1/built_networks/10 - testing_the_train_function/')
saver.restore(sess, save_path = savepath + '{}'.format(model_name))

When I run this code I'm running into the following error:

tensorflow.python.framework.errors_impl.NotFoundError: Key conv_layer1/biases/Variable not found in checkpoint

I saw some similar problems that were solved and tried the solutions. No one works. The directory name is the same in both codes (as far as I can see, can you give me an advise how to confirm?), and also the model is saved properly (same note).

I will really appreciate your help! Thanks!!

Full error log below:

2018-06-30 00:53:02.524332: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key conv_layer1/biases/Variable not found in checkpoint
Traceback (most recent call last):
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call
    return fn(*args)
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key conv_layer1/biases/Variable not found in checkpoint
     [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
     [[Node: save/RestoreV2/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/Roi/Desktop/Code_Win_Ver/code_files/Tensor_Flow/version1/find_labels_for_db.py", line 252, in <module>
    saver.restore(sess, save_path = savepath + '{}'.format(model_name))
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1802, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 900, in run
    run_metadata_ptr)
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run
    run_metadata)
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key conv_layer1/biases/Variable not found in checkpoint
     [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
     [[Node: save/RestoreV2/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Caused by op 'save/RestoreV2', defined at:
  File "C:/Users/Roi/Desktop/Code_Win_Ver/code_files/Tensor_Flow/version1/find_labels_for_db.py", line 247, in <module>
    saver = tf.train.Saver()
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1338, in __init__
    self.build()
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1347, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1384, in _build
    build_save=build_save, build_restore=build_restore)
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 835, in _build_internal
    restore_sequentially, reshape)
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 472, in _AddRestoreOps
    restore_sequentially)
  File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 886, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "C:\Python35\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1546, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "C:\Python35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
    op_def=op_def)
  File "C:\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key conv_layer1/biases/Variable not found in checkpoint
     [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
     [[Node: save/RestoreV2/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]


Process finished with exit code 1

Upvotes: 0

Views: 1287

Answers (1)

Diana
Diana

Reputation: 387

So the error occurs, because a Variable is not existent in your checkpoint. To resolve this call your saver before you create corresponding Variable.

saver = tf.train.Saver()
conv_layer1 = ...

saver.restore(path=...)

Now if you call save after training or whatever you can just call save. All newly added Variables, as e.g. conv_layer1/biases/Variable will and already existent Variables will be added to this checkpoint.

After that you should rearrange your code, so you call the saver after those Variables, that led to problems, like:

conv_layer1 = ...
saver = tf.train.Saver()

saver.restore(path=...)

Upvotes: 1

Related Questions