DDPG lerning problem with Tensorflow 2 implementation

Question

I'm trying to implement DDPG with tensorflow 2. The problem is that it doesn't learn: even after adding some noise and some expolitation vs exploration factor the agent seems to stuck everytime in a generic direction, only changing its intensity.

This is my Actor neural network:

    d1 = self.dense(states, weights[0], weights[1])
    d1 = tf.nn.relu(d1)
    d2 = self.dense(d1, weights[2], weights[3])
    d2 = tf.nn.relu(d2)
    d3 = self.dense(d2, weights[4], weights[5])
    d3 = tf.nn.tanh(d3)
    return d3*self.action_bounds

and this is its training function:

def train(self, states, critic_gradients):
    with tf.GradientTape() as t:
        actor_pred = self.network(states)

    actor_gradients = \
        t.gradient(actor_pred, self.weights, -critic_gradients)
    actor_gradients = list(map(lambda x: x/self.batch_size, actor_gradients))

    self.opt.apply_gradients(zip(actor_gradients, self.weights))

Where critic_gradients are taken by the critic class.

The critic net is similar to the actor's one:

def _network(self, states, actions, weights, axis):
    x = tf.concat([states, actions], axis=axis)
    d1 = self.dense(x, weights[0], weights[1])
    d1 = tf.nn.relu(d1)
    d2 = self.dense(d1, weights[2], weights[3])
    d2 = tf.nn.relu(d2)
    d3 = self.dense(d2, weights[4], weights[5])
    d3 = tf.nn.relu(d3)
    return d3

With weights: self.shapes = [ [self.state_size+self.action_size, 64], [64], [64, 32], [32], [32, 1], [1] ]

Critic trains with a simple minimize function over a mean squared error function loss.

I can't get if the error is in the main (that I wrote following the main paper) or in the classes. One thing to note is that I tested the critic's network with a simple dataset and it converges. I don't know how to try the actor network, i'm just using Gym with Pendulum environment.

DDPG lerning problem with Tensorflow 2 implementation

Answers (1)

Related Questions