Reputation: 1519
I am trying to launch a Distributed Tensorflow and get the following error. My code looks like this:
sv = tf.train.Supervisor(is_chief=(task_index == 0), logdir="/tmp/train_logs", init_op=init_op,
summary_op=summary_op, saver=saver, global_step=global_step, save_model_secs=600)
with sv.managed_session(server.target) as sess:
step = 0
while not sv.should_stop() and step < nnc.steps:
mini_batches = random_mini_batches(x_train, y_train, mini_batch_size)
for mini_batch in mini_batches:
(batch_x, batch_y) = mini_batch
_, step = sess.run([train_op, global_step], feed_dict={x: batch_x, y: batch_y})
When I get the error it's failed on random_mini_batches
function.
But I completely don't understand how and why. random_mini_batches
function is a function written in pure python + numpy without anything related to TensorFlow. x_train
and y_train
were not used before.
Here is the error that I get:
File "/Users/curr_user/PycharmProjects/curr_project/src/nn.py", line 36, in random_mini_batches
num_complete_minibatches = int(math.floor(m / mini_batch_size)) # number of mini batches of size mini_batch_size
File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 880, in r_binary_op_wrapper
x = ops.convert_to_tensor(x, dtype=y.dtype.base_dtype, name="x")
File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 611, in convert_to_tensor
as_ref=False)
File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 676, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 121, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 106, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2582, in create_op
self._check_not_finalized()
File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2290, in _check_not_finalized
raise RuntimeError("Graph is finalized and cannot be modified.")
Any help would be highly appreciated! Thanks
Upvotes: 1
Views: 2401
Reputation: 53768
It's not in your question, but I think that mini_batch_size
is a constant tensor. Though random_mini_batches
is in pure python and numpy, tensorflow overloads lots of operators with tensors, so this line
num_complete_minibatches = int(math.floor(m / mini_batch_size))
is, in fact, performing a __div__
operation on a tensor, which forces to convert m
to a tensor as well. But tf.train.Supervisor()
forces the graph finalization, i.e. no more nodes can be created, as a result, the conversion fails.
The solution is to make mini_batch_size
an ordinary constant and make sure no tensors are used inside random_mini_batches
.
Upvotes: 1