Dueling DQN with Keras

Question

I'm trying to implement Dueling DQN but it looks like it's not learning if I make the NN architecture this way

        X_input = Input(shape=(self.state_size,))
        X = X_input
        X = Dense(512, input_shape= (self.state_size,), activation="relu")(X_input)
        X = Dense(260, activation="relu")(X)
        X = Dense(100, activation="relu")(X)
        state_value = Dense(1)(X)
        state_value = Lambda(lambda v: v, output_shape=(self.action_size,))(state_value)
        action_advantage = Dense(self.action_size)(X)
        action_advantage = Lambda(lambda a: a[:, :] - K.mean(a[:, :], keepdims=True), output_shape=(self.action_size,))(action_advantage)
        X = Add()([state_value, action_advantage])
        model = Model(inputs = X_input, outputs = X)
        model.compile(loss="mean_squared_error", optimizer=Adam(lr=self.learning_rate))
        return model

I searched online and found some code (which worked way better than mine) the only difference was

        state_value = Lambda(lambda s: K.expand_dims(s[:, 0],-1), output_shape=(self.action_size,))(state_value)

link to the code https://github.com/pythonlessons/Reinforcement_Learning/blob/master/03_CartPole-reinforcement-learning_Dueling_DDQN/Cartpole_Double_DDQN.py#L31 I can't understand why mine is not (learning) because It runs. And I don't understand why did he only take the first value of each row of the tensor?

Mario Niemann · Accepted Answer

Expanding the dims of the state value ensures that it is added to every advantage value when the Add() is happening.

You could also write it the following way: take away the lambda functions and write out the actual calculation for the Q-Values the following way:

X = (state_value + (action_advantage - tf.math.reduce_mean(action_advantage, axis=1, keepdims=True)))

The results will be the same but the code might be a bit more readable.

So in total your code would look like that:

X_input = Input(shape=(self.state_size,))
X = X_input
X = Dense(512, input_shape= (self.state_size,), activation="relu")(X_input)
X = Dense(260, activation="relu")(X)
X = Dense(100, activation="relu")(X)
state_value = Dense(1)(X)
action_advantage = Dense(self.action_size)(X)

X = (state_value + (action_advantage - tf.math.reduce_mean(action_advantage, axis=1, keepdims=True)))

model = Model(inputs = X_input, outputs = X)
model.compile(loss="mean_squared_error", optimizer=Adam(lr=self.learning_rate))
return model

Dueling DQN with Keras

Answers (1)

Related Questions