Reputation: 1409

Tensorflow : Graph is finalized and cannot be modified

I am trying to save variables through checkpoints to introduce fault tolerance to my program. I am trying to achieve this by using the MonitoredTrainingSession function. The following is my configuration:-

import tensorflow as tf

global_step = tf.Variable(10, trainable=False, name='global_step')
x = tf.constant(2)

with tf.device("/job:local/task:0"):
    y1 = tf.Variable(x + 300)

with tf.device("/job:local/task:1"):
    y2 = tf.Variable(x**2)

with tf.device("/job:local/task:2"):
    y3 = tf.Variable(5*x)

with tf.device("/job:local/task:3"):
    y0 = tf.Variable(x - 66)
    y = y0 + y1 + y2 + y3

model = tf.global_variables_initializer()
saver = tf.train.Saver(sharded=True)

chief = tf.train.ChiefSessionCreator(scaffold=None, master='grpc://localhost:2222', config=None, checkpoint_dir='/home/tensorflow/codes/checkpoints')
summary_hook = tf.train.SummarySaverHook(save_steps=None, save_secs=10, output_dir='/home/tensorflow/codes/savepoints', summary_writer=None, scaffold=None, summary_op=tf.summary.tensor_summary(name="y", tensor=y))
saver_hook = tf.train.CheckpointSaverHook(checkpoint_dir='/home/tensorflow/codes/checkpoints', save_secs=None, save_steps=True, saver=saver, checkpoint_basename='model.ckpt', scaffold=None)

# with tf.train.MonitoredSession(session_creator=ChiefSessionCreator,hooks=[saver_hook, summary_hook]) as sess:

with tf.train.MonitoredTrainingSession(master='grpc://localhost:2222', is_chief=True, checkpoint_dir='/home/tensorflow/codes/checkpoints',
    scaffold=None, hooks=[saver_hook,summary_hook], chief_only_hooks=None, save_checkpoint_secs=None, save_summaries_steps=True, config=None) as sess:

    while not sess.should_stop():
        sess.run(tf.global_variables_initializer())

    while not sess.should_stop():
        result = sess.run(y)
        print(result)

I get the following RuntimeError which I am unable to resolve:-

Traceback (most recent call last):
  File "add_1.py", line 39, in <module>
    sess.run(tf.global_variables_initializer())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 1187, in global_variables_initializer
    return variables_initializer(global_variables())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 1169, in variables_initializer
    return control_flow_ops.group(*[v.initializer for v in var_list], name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2773, in group
    deps.append(_GroupControlDeps(dev, ops_on_device[dev]))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2721, in _GroupControlDeps
    return no_op(name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_control_flow_ops.py", line 186, in no_op
    result = _op_def_lib.apply_op("NoOp", name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2199, in create_op
    self._check_not_finalized()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1925, in _check_not_finalized
    raise RuntimeError("Graph is finalized and cannot be modified.")
RuntimeError: Graph is finalized and cannot be modified.

Upvotes: 8

Answers (4)

matwilso

Reputation: 3224

This may not be recommended for your use case, but it is possible to unfinalize a Graph:

sess.graph._unsafe_unfinalize()

Upvotes: 10

drrngrvy

Reputation: 1

Since your aim is to use MonitoredTrainingSession to get you checkpointing, the usage is much simpler than your example:

import tensorflow as tf

global_step = tf.contrib.framework.get_or_create_global_step()
x = tf.constant(2)
y1 = x + 300
y2 = x**2
y3 = x * 5
y0 = x - 66
y = y0 + y1 + y2 + y3
step = tf.assign_add(global_step, 1)

with tf.train.MonitoredTrainingSession(checkpoint_dir='/tmp/checkpoints') as sess:
    while not sess.should_stop():
        result, i = sess.run([y, step])
        print(result, i)

The hooks for saving/restoring checkpoints are created by MonitoredTrainingSession for you.
If you pass in save_checkpoint_secs you can change the frequency of checkpointing from the 10 minute default. I find a higher frequency isn't worth it: saving checkpoints isn't free, so very frequent checkpointing will end up slowing training down.
The ChiefSessionCreator and gRPC config is only needed for distributed running (see here for a description of the concepts. Similarly with assigning ops to specific devices - make sure you really need to do this before using it as it can slow things down if you're not careful.
You don't need to wrap the result of operations on tensors with tf.Variable() - they already are variables.
You can pass save_summaries_steps for monitoring training with tensorboard, but by default that'll happen every 100 steps anyway.

Upvotes: 0

drimyus

Reputation: 91

If you want to initialize the graph on loop, you can use the function to create new graph on top of loop.

import tensorflow as tf

tf.reset_default_graph()
tf.Graph().as_default()

Upvotes: 8

guinny

Reputation: 1532

The root cause for your error seems to be that MonitoredTrainingSession has finalized (frozen) the graph and your tf.global_variable_initializer() is no longer able to modify it.

Having said that, there are multiple things that require attention:

1) Why do you try to repeatedly initialize all variables here?

while not sess.should_stop():
    sess.run(tf.global_variables_initializer())

2) It seems some of your code is already included in MonitoredTrainingSession, e.g. ChiefSessionCreator. Can you please take another look at the code (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/monitored_session.py#L243) or search for its sample usage and see how MonitoredTrainingSession is supposed to be used?

Upvotes: 11

Tensorflow : Graph is finalized and cannot be modified

Answers (4)

Related Questions