Does Sarsa still converge even when epsilon changes during each episode?

Question

I use n-step Sarsa/sometimes Sarsa(lambda)

After experimenting a bit with different epsilon schedules I found out that the agent learns faster when I change the epsilon during an episode based on the number of steps already taken and the mean length of the last 10 episodes.

Low number of steps/beginning of episode => Low epsilon
High number of steps/end of episode => High epsilon

This works far better than just an epsilon decay over time from episode to episode.

Does the theory allow this?

I think yes because all states are still visited regularly.

Pablo EM · Accepted Answer

Yes, SARSA algorithm converges even in the case you are updating epsilon parameter within each episode. The requirement is that epsilon should eventually tend to zero or a small value.

In you case, if you are starting with a small epsilon value in each episode and increasing it as the number of steps grows, it's not very clear to me that your algorithm will converge towards an optimal policy. I mean, at some point epsilon should decrease.

The "best" epsilon schedule is highly problem dependent, and there is not a schedule that works fine in all problems. So, at the end, it's required some experience in the problem and probably some trial and error adjustment.

Does Sarsa still converge even when epsilon changes during each episode?

Answers (1)

Related Questions