TensorFlow cannot use apply_gradients with AdamOptimizer

Question

I want to use the AdamOptimizer, but I also want to edit my gradients every step.

The typical usage is as follows:

train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
sess.run(train_step, feed_dict=feed_dict)

This applies a single training step with the AdamOptimizer.

I want to modify the gradients every step, so I extract them and them reinsert them with the following code:

opt = tf.train.AdamOptimizer(learning_rate=1e-3)
grads_and_vars = opt.compute_gradients(loss)
train_opt = opt.apply_gradients(grads_and_vars)
sess.run(train_opt, feed_dict=feed_dict)

I would normally apply some operations to grads_and_vars, but I'm just trying to get this to work first. The previous code fails at sess.run(train_opt, feed_dict=feed_dict) because of the following error:

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value beta1_power_1
     [[Node: beta1_power_1/read = Identity[T=DT_FLOAT, _class=["loc:@Variable"], _device="/job:localhost/replica:0/task:0/cpu:0"](beta1_power_1)]]

which is caused by train_opt = opt.apply_gradients(grads_and_vars), but am I not applying the gradients correctly?

There is no error with the GradientDescentOptimizer, so I know this must be the right way to extract the gradients and then reinsert them for a training step.

Is there something I'm missing? How can I use the AdamOptimizer this way?

EDIT: I mentioned that the second code block works with GradientDescentOptimizer, but it is about 10 times slower than the first code. Is there a way to speed that up?

Jie.Zhou · Accepted Answer

run this sess.run(tf.local_variables_initializer()), there are local variables in adam, you need to initialize them

TensorFlow cannot use apply_gradients with AdamOptimizer

Answers (1)

Related Questions