Reputation: 1
I am training a DQN and the Q-value keeps going down. The curve looks very weird (see below).
Every step corresponds to an update to target network. Any possible reason why this happens?
Upvotes: 0
Views: 401
Reputation: 153
Does the step correspond to the Target Q network update? If so try to:
1) update the TargetQ network less frequently
2) increase the discount factor (e.g. to .99 if you were using .5)
3) use a smooth update for the TargetQ network in the form (1 - tau)old + tauv1
Upvotes: 1