What is the Full Meaning of the Discount Factor γ (gamma) in Reinforcement Learning?

Question

I'm relatively new to machine learning concepts, and I have been following several lectures/tutorials covering Q-Learning, such as: Stanford's Lecture on Reinforcement Learning

They all give short, or vague answers to what exactly gamma's utility is in the policy function. The most understandable explanation I have found thus far says it is "how much we value future rewards."

Is it really that simple? Is gamma what defines how we delay rewards/look ahead? Such as knowing to take option B in the following example:

In case of two options, A and B, A will give an immediate payoff of 10 then a payoff of another 10, while B will give an immediate payoff of 0 and then 30.

So, my questions:

What is a deep explanation of gamma?
How do we set it?
If it's not for looking-ahead, how do we look ahead?

What is the Full Meaning of the Discount Factor γ (gamma) in Reinforcement Learning?

Answers (1)

Related Questions