What decides epsilon decay value in reinforcement learning?

Question

I've been learning Q learning from the youtube lecture below https://www.youtube.com/watch?v=Gq1Azv_B4-4&list=PLlMOxjd7OfgNxJSgF8pAs3_qMion-X1QI&index=2

In this tutorial, the guy uses epsilon methodology like this(I cut the details out)

import gym
import numpy as np
env = gym.make("MountainCar-v0")
EPISODES = 2000
epsilon = 0.5
START_EPSILON_DECAYING = 1
END_EPSILON_DECAYING = EPISODES // 2
epsilon_decay_value = epsilon / (END_EPSILON_DECAYING - START_EPSILON_DECAYING) #this part is very confusing to me 
for episode in range(EPISODES):
   done = False
   while not done:

      if np.random.random() > epsilon:  
          action = np.argmax(q_table[discrete_state])
      else:      
          action = np.random.randint(0, env.action_space.n)

      if END_EPSILON_DECAYING >= episode >= START_EPSILON_DECAYING:
          epsilon -= epsilon_decay_value

I could somewhat understand the concept of epsilon greedy but I haven't faintest idea how to apply it when program it. What I understood is 'epsilon greedy' is to balance between exploration and exploitation. But I don't know why epsilon should be diminished and what decides epsilon decay value formula.

What decides epsilon decay value in reinforcement learning?

Answers (1)

Related Questions