SGD convergence test using learning rates

Question

Can anyone give an explanation for the convergence test presented int the 8th minute of this lecture by Hugo Larochelle ?

Pablo EM · Accepted Answer

These conditions ensure the convergence asymptotically. In this case, we should be able to update the approximated solution an infinite number of times. Intuitively, to achieve this, the learning rate should be always greater than zero. The first condition means or implies that the learning rate is always larger than 0.

On the other hand, in addition to "update infinitely" our approximated solution, we are interested in going closer to the optimum solution. To achieve this, the learning rate should be smaller and smaller. The second condition means that alpha parameter should decrease monotonically.

Both conditions are required not only in SGD, but in many other stochastic approximation methods. Sometimes they are referred as Robbins-Monro conditions due to the Robbins–Monro algorithm.

SGD convergence test using learning rates

Answers (1)

Related Questions