Reputation: 345
I am using Q-Learning algorithm on a simulation. this simulation has limited iterations (600 to 700). the learning process is activated for several runs of this simulation (100 run). I am new to reinforcement learning, and i have an issue here about how to use exploration/exploitation on such kind of simulation (I am using e-greedy exploration). I am using a decreasing exploration and I am wondering if I should use the decreasing exploration on the whole simulation runs, or decrease it for each simulation run (initiate epsilon to 0.9 for each simulation run and then decrease it). Thank you
Upvotes: 1
Views: 105
Reputation: 17282
You won’t need such a high initiation of the epsilon. It might be better to initialize the q-values as very high, so that unknown q-values are always picked above q-values that has been explored at least once.
Considering your state space, it doesn’t matter whether you decrease it after a whole run or an individual run, but individually sounds like a better option.
How fast you decrease it will also depend on the circumstances of the world and how fast the agent learns. I’m trying to make my alpha and epsilon correlate to the error, but it’s tricky to do that.
Upvotes: 1