Adam in Tensorflow: where does moment estimates happen?

Question

I know that optimizers in Tensorflow divide minimize into compute_gradients and apply_gradients. However, optimization algorithms like Adam generally process the gradients with momentum and some other techniques as the following figure suggests(Thanks @kmario23 for providing the figure). I wonder when these techniques are applied to the gradients? Are they applied in compute_gradients or apply_gradients?

Update

sess = tf.Session()
x = tf.placeholder(tf.float32, [None, 1])
y = tf.layers.dense(x, 1)
loss = tf.losses.mean_squared_error(tf.ones_like(y), y)
opt = tf.train.AdamOptimizer()
grads = opt.compute_gradients(loss)
sess.run(tf.global_variables_initializer())
print(sess.run(grads, feed_dict={x: [[1]]}))
print(sess.run(grads, feed_dict={x: [[1]]}))

The above code outputs the same results twice, does it suggest that moment estimates are computed in apply_gradients? Because, IMHO, if moment estimates are computed in apply_gradients, then after the first print statement, first and second moments will be updated, which should result in different result in the second printstatement.

Maybe · Accepted Answer

compute_gradients computes only gradients, all other additional operations corresponding to specific optimization algorithms are done in apply_gradients. The code in the update is one evidence, another evidence is the following figure cropped from tensorboard, where Adam corresponds to the compute_gradient operation.

Adam in Tensorflow: where does moment estimates happen?

Update

Answers (2)

Related Questions