siva
siva

Reputation: 1523

Does Sarsa still converge even when epsilon changes during each episode?

I use n-step Sarsa/sometimes Sarsa(lambda)

After experimenting a bit with different epsilon schedules I found out that the agent learns faster when I change the epsilon during an episode based on the number of steps already taken and the mean length of the last 10 episodes.

Low number of steps/beginning of episode => Low epsilon
High number of steps/end of episode => High epsilon

This works far better than just an epsilon decay over time from episode to episode.

Does the theory allow this?

I think yes because all states are still visited regularly.

Upvotes: 2

Views: 1341

Answers (1)

Pablo EM
Pablo EM

Reputation: 6689

Yes, SARSA algorithm converges even in the case you are updating epsilon parameter within each episode. The requirement is that epsilon should eventually tend to zero or a small value.

In you case, if you are starting with a small epsilon value in each episode and increasing it as the number of steps grows, it's not very clear to me that your algorithm will converge towards an optimal policy. I mean, at some point epsilon should decrease.

The "best" epsilon schedule is highly problem dependent, and there is not a schedule that works fine in all problems. So, at the end, it's required some experience in the problem and probably some trial and error adjustment.

Upvotes: 3

Related Questions