Reputation: 61
I'm new to Reinforcement Learning. Recently, I've been trying to train a Deep Q Network to solve OpenAI gym's CartPole-v0 , where solving means achieving an average score of at least 195.0 over 100 consecutive episodes.
I am using a 2 layer neural network, experience replay with the memory containing 1 million experiences, epsilon greedy policy, RMSProp optimizer and Huber loss function.
With this setting, solving the task is taking several thousand episodes (> 30k). Learning is also quite unstable at times. So, is it normal for Deep Q Networks to oscillate and take this long for learning a task like this? What other alternatives (or improvements on my DQN) can give better results?
Upvotes: 3
Views: 1094
Reputation: 361
What other alternatives (or improvements on my DQN) can give better results?
in my experience, policy gradients work well with the cartpole. also, they are fairly easy to implement (if you squint, policy gradients almost look like supervised learning).
a good place to start: http://kvfrans.com/simple-algoritms-for-solving-cartpole/
Upvotes: 2