hjung
hjung

Reputation: 89

Why the learning rate for Q-learning is important for stochastic environments?

As stated in the Wikipedia https://en.wikipedia.org/wiki/Q-learning#Learning_Rate, for a stochastic problem, using the learning rate is important for convergence. Although I tried to find the "intuition" behind the reason without any mathematical proof, I could not find it.

Specifically, it is difficult for me to understand why updating q-values slowly is beneficial for a stochastic environment. Could anyone please explain the intuition or motivation?

Upvotes: 2

Views: 580

Answers (1)

Qrow Saki
Qrow Saki

Reputation: 1052

After you get close enough to convergence, a stochastic environment would make it impossible to converge if the learning rate is too high.

Think of it like a ball rolling into a funnel. The speed at which the ball is rolling is like the learning rate. Because it's stochastic, the ball will never directly go into the hole, it will always just miss it. Now, if the learning rate is too high, then just missing is disastrous. It will shoot right past the hole.

That is why you want to steadily decrease the learning rate. It is like the ball losing velocity due to friction, which will always allow it to drop into the hole no matter which direction it's coming from.

Upvotes: 1

Related Questions