deep q learning is not converging

Question

I'm experimenting with deep q learning using Keras , and i want to teach an agent to perform a task .

in my problem i wan't to teach an agent to avoid hitting objects in it's path by changing it's speed (accelerate or decelerate)

the agent is moving horizontally and the objects to avoid are moving vertically and i wan't him to learn to change it's speed in a way to avoid hitting them . i based my code on this : Keras-FlappyBird

i tried 3 different models (i'm not using convolution network)

model with 10 dense hidden layer with sigmoid activation function , with 400 output node
model with 10 dense hidden layer with Leaky ReLU activation function
model with 10 dense hidden layer with ReLu activation function, with 400 output node

and i feed to the network the coordinates and speeds of all the object in my word to the network .

and trained it for 1 million frame but still can't see any result here is my q-value plot for the 3 models ,

Model 1 : q-value Model 2 : q-value

Model 3 : q-value

Model 3 : q-value zoomed

as you can see the q values isn't improving at all same as fro the reward ... please help me what i'm i doing wrong ..

deep q learning is not converging

Answers (1)

Related Questions