Reputation: 339
i implemented DQN from scratch in java, everything is custom made. I made it to play snake and results are really good. But i have a problem.
To make network as stable as possible, im using replay memory
and also target network
. The network is converging really well. But after some time it just breaks.
This is a graph (X - played games, Y - average points scored )
This 'break' happens usually few games after i update target
network with policy
network.
Settings i use for DQN:
discount factor: 0.9
learning rate: 0.001
steps to update target network: 300 000 (means every 300k steps i update target network with policy)
replay memory size: 300 000
replay memory batch size: 256 (every step i take 256 samples from replay memory and train network)
Any ideas what could be wrong? Thanks for answers.
Upvotes: 0
Views: 524
Reputation: 1
Look up "catastrophic forgetting"
Try adjusting your replay-memory size and the number of steps to update your target network.
Upvotes: 0