Qlearning Epsilon-greedy exploration: Epsilon decay X fixed

I am teaching an agent to get out of a maze collecting all apples on its way using Qlearning.

I read that is possible to leave a fixed epsilon or to choose an epsilon and decay it as time passes.

I couldn't find the advantages or disadvantages of each approach, I would love to hear more if you can help me understanding which should I use.

Upvotes: 0

Answers (1)

francoisr

Reputation: 4595

I'm going to assume you're referring to epsilon as in "epsilon-green exploration". The goal of this parameter is to control how much your agent believe in his current policy. With a large epsilon value, your agent will tend to ignore his policy and choose random action. This exploration is often a good idea when your policy is rather weak, especially at the beginning of training. Sometimes, people decay epsilon as time passes in order to reflect that their policy gets better and better and they want to exploit rather than explore.

There is no right way to pick epsilon, or its decay rate, for every problem. The best way is probably to try out different values.

Upvotes: 2

Qlearning Epsilon-greedy exploration: Epsilon decay X fixed

Answers (1)

Related Questions