How does exploration work in OpenAI Baselines?

Question

I'm starting to play around with https://github.com/openai/baselines/, specifically the deepq algorithm. I wanted to do my own analysis of the parameters passed into the deepq.learn method.

The method has two parameters related to exploration - exploration_fraction and exploration_final_eps.

The way I understand it - exploration_fraction determines how much of the training time does the algorithm spend exploring, and exploration_final_eps drives the probability of taking a random action each time explores. So - the number of random actions taken for the sake of exploring is a product of exploration_fraction and exploration_final_eps. Is that correct?

Can someone provide an explanation (in layman terms) of how the algorithm explores, based on these two parameters?

How does exploration work in OpenAI Baselines?

Answers (1)

Related Questions