Reputation: 21
I'm relatively new to machine learning concepts, and I have been following several lectures/tutorials covering Q-Learning, such as: Stanford's Lecture on Reinforcement Learning
They all give short, or vague answers to what exactly gamma's utility is in the policy function. The most understandable explanation I have found thus far says it is "how much we value future rewards."
Is it really that simple? Is gamma what defines how we delay rewards/look ahead? Such as knowing to take option B in the following example:
In case of two options, A and B, A will give an immediate payoff of 10 then a payoff of another 10, while B will give an immediate payoff of 0 and then 30.
So, my questions:
Upvotes: 2
Views: 6855
Reputation: 306
The gamma parameter is indeed used to say something about how you value your future rewards. In more detail your discounted reward (which is used in training) looks like:
This means that an exponential function decides on how the future rewards are taken into account. As an example, let's compare 2 gamma values:
Let's look at when gamma**steps reaches 0.5. In the case of gamma = 0.9, this is 6 steps. With gamma = 0.99 it is more like 60ish steps. This means that for gamma = 0.9 the reward in 6 steps is half as important as the immediate reward, but for gamma = 0.99, the same is valid for 60 steps. The drop-off is thus much less significant for gamma = 0.99 and the rewards in the future are higher valued than with gamma = 0.9. To set which gamma parameter you need for you application, it is important to have some kind of feeling on how much steps you need in your environment to get to your rewards.
To come back to your option A and B. A should have a low gamma value as the immediate reward is very important. Option B should have a higher gamma value because the reward is in the future.
Upvotes: 9