Baaam Park
Baaam Park

Reputation: 425

What decides epsilon decay value in reinforcement learning?

I've been learning Q learning from the youtube lecture below https://www.youtube.com/watch?v=Gq1Azv_B4-4&list=PLlMOxjd7OfgNxJSgF8pAs3_qMion-X1QI&index=2

In this tutorial, the guy uses epsilon methodology like this(I cut the details out)

import gym
import numpy as np
env = gym.make("MountainCar-v0")
EPISODES = 2000
epsilon = 0.5
START_EPSILON_DECAYING = 1
END_EPSILON_DECAYING = EPISODES // 2
epsilon_decay_value = epsilon / (END_EPSILON_DECAYING - START_EPSILON_DECAYING) #this part is very confusing to me 
for episode in range(EPISODES):
   done = False
   while not done:

      if np.random.random() > epsilon:  
          action = np.argmax(q_table[discrete_state])
      else:      
          action = np.random.randint(0, env.action_space.n)

      if END_EPSILON_DECAYING >= episode >= START_EPSILON_DECAYING:
          epsilon -= epsilon_decay_value

I could somewhat understand the concept of epsilon greedy but I haven't faintest idea how to apply it when program it. What I understood is 'epsilon greedy' is to balance between exploration and exploitation. But I don't know why epsilon should be diminished and what decides epsilon decay value formula.

Upvotes: 0

Views: 3548

Answers (1)

M Z
M Z

Reputation: 4799

Epsilon becomes diminished because as your model explores and learns, it becomes less and less important to explore and more and more important to follow your learned policy. Imagine this scenario: If your model still "explores" after learning a policy, it may very much choose an action it knows to be a poor choice. The whole idea of using epsilon-greedy is because it helps in the learning process, not the decision-making process.

Epsilon decay typically follows an exponential decay function, meaning it becomes multiplied by a percentage after every x episodes. I believe sentdex actually provides one later in his video/s. The key factor in determining your epsilon decay function is typically the scale at which it decays (in the exponential case, what percentage does it decay, and after how many episodes do you decay it?). There's also the question as to whether or not your environment would be beneficial to flooring the function as well.

Upvotes: 2

Related Questions