Why the learning rate for Q-learning is important for stochastic environments?

Question

As stated in the Wikipedia https://en.wikipedia.org/wiki/Q-learning#Learning_Rate, for a stochastic problem, using the learning rate is important for convergence. Although I tried to find the "intuition" behind the reason without any mathematical proof, I could not find it.

Specifically, it is difficult for me to understand why updating q-values slowly is beneficial for a stochastic environment. Could anyone please explain the intuition or motivation?

Qrow Saki · Accepted Answer

After you get close enough to convergence, a stochastic environment would make it impossible to converge if the learning rate is too high.

Think of it like a ball rolling into a funnel. The speed at which the ball is rolling is like the learning rate. Because it's stochastic, the ball will never directly go into the hole, it will always just miss it. Now, if the learning rate is too high, then just missing is disastrous. It will shoot right past the hole.

That is why you want to steadily decrease the learning rate. It is like the ball losing velocity due to friction, which will always allow it to drop into the hole no matter which direction it's coming from.

Why the learning rate for Q-learning is important for stochastic environments?

Answers (1)

Related Questions