TensorFlow uninitialized value error with mse loss

Question

I'm trying to train an autoencoder with mse loss function with TensorFlow r1.2, but I keep getting a FailedPreconditionError which states that one of the variables related to computing the mse is uninitialized (see full stack trace printout below). I'm running this in Jupyter notebook and I'm using Python 3.

I trimmed down my code to a minimal example as follows

import tensorflow as tf
import numpy as np
from functools import partial


# specify network

def reset_graph(seed=0):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)
reset_graph()

n_inputs = 100
n_hidden = 6
n_outputs = n_inputs

learning_rate = 0.001
l2_reg = 0.001

X = tf.placeholder(tf.float32, shape=[None, n_inputs])

he_init = tf.contrib.layers.variance_scaling_initializer()
l2_regularizer = tf.contrib.layers.l2_regularizer(l2_reg)
my_dense_layer = partial(tf.layers.dense,
                         activation=tf.nn.elu,
                         kernel_initializer=he_init,
                         kernel_regularizer=l2_regularizer)

hidden1 = my_dense_layer(X, n_hidden1)
outputs = my_dense_layer(hidden1, n_outputs, activation=None)

reconstruction_loss = tf.reduce_mean(tf.metrics.mean_squared_error(X, outputs))

reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
loss = tf.add_n([reconstruction_loss] + reg_losses)

optimizer = tf.train.AdamOptimizer(learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()


# generate 1000 random examples 

sample_X = np.random.rand(1000, 100)


# train network

n_epochs = 10
batch_size = 50
with tf.Session() as sess:
    sess.run(init) # init.run()
    for epoch in range(n_epochs):
        n_batches = sample_X.shape[0] // batch_size
        for iteration in range(n_batches):
            start_idx = iteration*batch_size
            if iteration == n_batches-1:
                end_idx = sample_X.shape[0]
            else:
                end_idx = start_idx + batch_size
            sys.stdout.flush()   

            X_batch = sample_X[start_idx:end_idx]
            sess.run(training_op, feed_dict={X: X_batch})

            loss_train = reconstruction_loss.eval(feed_dict={X: X_batch})
            print(round(loss_train, 5))

When I replace the line that defines reconstruction_loss to not use tf.metrics, as follows

reconstruction_loss = tf.reduce_mean(tf.square(tf.norm(outputs - X)))

I don't get the exception.

I've checked several similar SO questions, but none of them has solved my problem. For example, one possible cause, suggested in an answer at FailedPreconditionError: Attempting to use uninitialized in Tensorflow, is failing to initialize all the variables in the TF graph, but my script initializes all TF variables with init = tf.global_variables_initializer() and then sess.run(init). Another possible cause is that the Adam optimizer creates its own variables, which need to be initialized after specifying the optimizer (see Tensorflow: Using Adam optimizer). However, my script defines the variable initializer after the optimizer, as suggested in the accepted answer to that question, so that also can't be my problem.

Can anyone spot anything wrong with my script or suggest things to try to suss out the cause of this error?

Below is the stack trace from the error.

---------------------------------------------------------------------------
FailedPreconditionError                   Traceback (most recent call last)
~\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\client\session.py in _do_call(self, fn, *args)
   1138     try:
-> 1139       return fn(*args)
   1140     except errors.OpError as e:

~\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1120                                  feed_dict, fetch_list, target_list,
-> 1121                                  status, run_metadata)
   1122 

~\AppData\Local\Continuum\Anaconda3\lib\contextlib.py in __exit__(self, type, value, traceback)
     88             try:
---> 89                 next(self.gen)
     90             except StopIteration:

~\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\framework\errors_impl.py in raise_exception_on_not_ok_status()
    465           compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466           pywrap_tensorflow.TF_GetCode(status))
    467   finally:

FailedPreconditionError: Attempting to use uninitialized value mean_squared_error/total
     [[Node: mean_squared_error/total/read = Identity[T=DT_FLOAT, _class=["loc:@mean_squared_error/total"], _device="/job:localhost/replica:0/task:0/cpu:0"](mean_squared_error/total)]]

During handling of the above exception, another exception occurred:

FailedPreconditionError                   Traceback (most recent call last)
 in ()
     64             sess.run(training_op, feed_dict={X: X_batch})
     65 
---> 66             loss_train = reconstruction_loss.eval(feed_dict={X: X_batch})
     67             print(round(loss_train, 5))

~\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\framework\ops.py in eval(self, feed_dict, session)
    604 
    605     """
--> 606     return _eval_using_default_session(self, feed_dict, self.graph, session)
    607 
    608 

~\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\framework\ops.py in _eval_using_default_session(tensors, feed_dict, graph, session)
   3926                        "the tensor's graph is different from the session's "
   3927                        "graph.")
-> 3928   return session.run(tensors, feed_dict)
   3929 
   3930 

~\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    787     try:
    788       result = self._run(None, fetches, feed_dict, options_ptr,
--> 789                          run_metadata_ptr)
    790       if run_metadata:
    791         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
    995     if final_fetches or final_targets:
    996       results = self._do_run(handle, final_targets, final_fetches,
--> 997                              feed_dict_string, options, run_metadata)
    998     else:
    999       results = []

~\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1130     if handle is None:
   1131       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1132                            target_list, options, run_metadata)
   1133     else:
   1134       return self._do_call(_prun_fn, self._session, handle, feed_dict,

~\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\client\session.py in _do_call(self, fn, *args)
   1150         except KeyError:
   1151           pass
-> 1152       raise type(e)(node_def, op, message)
   1153 
   1154   def _extend_graph(self):

FailedPreconditionError: Attempting to use uninitialized value mean_squared_error/total
     [[Node: mean_squared_error/total/read = Identity[T=DT_FLOAT, _class=["loc:@mean_squared_error/total"], _device="/job:localhost/replica:0/task:0/cpu:0"](mean_squared_error/total)]]

Caused by op 'mean_squared_error/total/read', defined at:
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib
unpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib
unpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\__main__.py", line 3, in 
    app.launch_new_instance()
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	raitlets\config\application.py", line 658, in launch_instance
    app.start()
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 474, in start
    ioloop.IOLoop.instance().start()
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	ornado\ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	ornado\stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	ornado\stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 276, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 228, in dispatch_shell
    handler(stream, idents, msg)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 390, in execute_request
    user_expressions, allow_stdin)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 501, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2698, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2802, in run_ast_nodes
    if self.run_code(code, result):
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "", line 32, in 
    reconstruction_loss = tf.reduce_mean(tf.metrics.mean_squared_error(X, outputs))
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\ops\metrics_impl.py", line 1054, in mean_squared_error
    updates_collections, name or 'mean_squared_error')
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\ops\metrics_impl.py", line 331, in mean
    total = _create_local('total', shape=[])
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\ops\metrics_impl.py", line 196, in _create_local
    validate_shape=validate_shape)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\ops\variable_scope.py", line 1679, in variable
    caching_device=caching_device, name=name, dtype=dtype)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\ops\variables.py", line 200, in __init__
    expected_shape=expected_shape)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\ops\variables.py", line 319, in _init_from_args
    self._snapshot = array_ops.identity(self._variable, name="read")
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\ops\gen_array_ops.py", line 1303, in identity
    result = _op_def_lib.apply_op("Identity", input=input, name=name)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\framework\op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\framework\ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages	ensorflow\python\framework\ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value mean_squared_error/total
     [[Node: mean_squared_error/total/read = Identity[T=DT_FLOAT, _class=["loc:@mean_squared_error/total"], _device="/job:localhost/replica:0/task:0/cpu:0"](mean_squared_error/total)]]

David Parks · Accepted Answer

Looks like you're doing everything right with initialization, so I suspect your error is that you're using tf.metrics.mean_squared_error incorrectly.

The metrics package of classes allows you to compute a value, but also accumulate that value over multiple calls to sess.run. Note the return value of tf.metrics.mean_square_error in the docs:

https://www.tensorflow.org/api_docs/python/tf/metrics/mean_squared_error

You get back both mean_square_error, as you appear to expect, and an update_op. The purpose of the update_op is that you ask tensorflow to compute the update_op and it accumulates the mean square error. Each time you call mean_square_error you get the accumulated value. When you want to reset the value you would run sess.run(tf.local_variables_initializer()) (note local and not global to clear "local" variables as the metrics package defines them).

I don't think the metrics package was intended to be used the way you're using it. I think your intention was to compute the mse only based on the current batch as your loss and not accumulate the value over multiple calls. I'm not even sure how differentiation would work with respect to an accumulated value like this.

So I think the answer to your question is: don't use the metrics package this way. Use metrics for reporting, and for accumulating results over multiple iterations of a test dataset, for example, not for generating a loss function.

I think what you mean to use is tf.losses.mean_squared_error

TensorFlow uninitialized value error with mse loss

Answers (1)

Related Questions