Me- La Ría
Me- La Ría

Reputation: 51

DQN performance swinging

I'm using DDQN with experience replay just like in this tutorial https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html except that I'm making the problem a little harder by obscuring x_dot and theta_t(cart velocity and angular velocity of the pole). I then calculate, considering the current state, the previous x_dot, theta_dot, x_dot_dot and theta_dot_dot and do the learning process by using this state space: (x, prev_x_dot, prev_prev_x_dot_dot, theta, prev_theta_dot, prev_prev_theta_dot_dot).

Anyway, my main issue is that by using the DQN algorithm as described in the above linked tutorial, the algorithm does not converge. I'm considering the learning to be successful if the average length of the last 100 episodes is > 450. When executing, I may see 50-60 consecutive 500 long episodes, but then the episode length randomly swings and goes down to even 20!?!? I need to push the problem harder by starting from any beginning position within a certain range(for each x, theta), but the results till now have not been promising.

Is this a normal behaviour for such an algorithm as DQN is? I get that being the policy computed based on previous executions there may be some convergence issue of the loss function, but does this include such severe swings in the performance?

I'm using a net with 3 nonlinear layers, and linear layers of dimension 256x256.

Upvotes: -1

Views: 32

Answers (1)

lejlot
lejlot

Reputation: 66815

In general, DQN has little to no convergence guarantees. For experimental, first studies, it might be better to start with a standard Policy Gradient, which under simple conditions (small enough learning rate/big enough batch size) will converge... but potentially very slowly. Q-Learning has nice properties due to ability to learn offpolicy etc. but if you are ok learning directly from experience, policy gradient is a more grounded method (as it has convergence guarantees with deep networks).

And to answer things more explicitly - yes, DQN can diverge, have chaotic behaviour etc. it is not out of ordinary.

Upvotes: 0

Related Questions