Reputation: 507
I am having trouble understanding the SARSA algorithm: http://en.wikipedia.org/wiki/SARSA
In particular, when updating the Q value what is gamma? and what values are used for s(t+1) and a(t+1)?
Can someone explain this algorithm to me?
Thanks.
Upvotes: 5
Views: 3087
Reputation: 14051
Gamma determines how much memory your algorithm has. If you set it to 0.0, then your algorithm will not update the value function Q at all. If you set it to 1.0, then the new experience will be given as much weight as all the previous experiences combined. The best values lie inbetween and have to be determined experimentally.
Here is how it works:
In effect, the value function is just a running average of these update values for each action and every state.
Upvotes: 4