Tensorflow: Is the learning rate you set in Adam and Adagrad just the initial learning rate?

Question

I'm reading this blog

https://smist08.wordpress.com/2016/10/04/the-road-to-tensorflow-part-10-more-on-optimization/

where it mentions all the tensorflow's learning rates

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

optimizer = tf.train.AdadeltaOptimizer(starter_learning_rate).minimize(loss)

optimizer = tf.train.AdagradOptimizer(starter_learning_rate).minimize(loss)     # promising

optimizer = tf.train.AdamOptimizer(starter_learning_rate).minimize(loss)      # promising

optimizer = tf.train.MomentumOptimizer(starter_learning_rate, 0.001).minimize(loss) # diverges

optimizer = tf.train.FtrlOptimizer(starter_learning_rate).minimize(loss)    # promising

optimizer = tf.train.RMSPropOptimizer(starter_learning_rate).minimize(loss)   # promising

It says that the learning rate you input is only the starter learning rate. Does that mean that if you change the learning rate in the middle of training, that change will have no effect because it's not using the starter learning rate anymore?

I tried looking at the API docs and it doesn't specify this.

Sraw · Accepted Answer

A short answer:

Except for your first line, the rest ones are all adaptive gradient descent optimizers which means they will automatically adjust learning rate based on some conditions during every step. So the learning rate given by you is just used to initialize.

Take AdamOptimizer as an example, you can learn its detail in this article.

Tensorflow: Is the learning rate you set in Adam and Adagrad just the initial learning rate?

Answers (1)

Related Questions