DQN unstable predictions

Question

i implemented DQN from scratch in java, everything is custom made. I made it to play snake and results are really good. But i have a problem.

To make network as stable as possible, im using replay memory and also target network. The network is converging really well. But after some time it just breaks.

This is a graph (X - played games, Y - average points scored )

This 'break' happens usually few games after i update target network with policy network.

Settings i use for DQN:

 discount factor: 0.9
 learning rate: 0.001
 steps to update target network: 300 000 (means every 300k steps i update target network with policy)
 replay memory size: 300 000
 replay memory batch size: 256 (every step i take 256 samples from replay memory and train network)

Any ideas what could be wrong? Thanks for answers.

DQN unstable predictions

Answers (1)

Related Questions