Aly Saeed
Aly Saeed

Reputation: 23

Dueling DQN with Keras

I'm trying to implement Dueling DQN but it looks like it's not learning if I make the NN architecture this way

        X_input = Input(shape=(self.state_size,))
        X = X_input
        X = Dense(512, input_shape= (self.state_size,), activation="relu")(X_input)
        X = Dense(260, activation="relu")(X)
        X = Dense(100, activation="relu")(X)
        state_value = Dense(1)(X)
        state_value = Lambda(lambda v: v, output_shape=(self.action_size,))(state_value)
        action_advantage = Dense(self.action_size)(X)
        action_advantage = Lambda(lambda a: a[:, :] - K.mean(a[:, :], keepdims=True), output_shape=(self.action_size,))(action_advantage)
        X = Add()([state_value, action_advantage])
        model = Model(inputs = X_input, outputs = X)
        model.compile(loss="mean_squared_error", optimizer=Adam(lr=self.learning_rate))
        return model

I searched online and found some code (which worked way better than mine) the only difference was

        state_value = Lambda(lambda s: K.expand_dims(s[:, 0],-1), output_shape=(self.action_size,))(state_value)

link to the code https://github.com/pythonlessons/Reinforcement_Learning/blob/master/03_CartPole-reinforcement-learning_Dueling_DDQN/Cartpole_Double_DDQN.py#L31 I can't understand why mine is not (learning) because It runs. And I don't understand why did he only take the first value of each row of the tensor?

Upvotes: 2

Views: 1091

Answers (1)

Mario Niemann
Mario Niemann

Reputation: 46

Expanding the dims of the state value ensures that it is added to every advantage value when the Add() is happening.

You could also write it the following way: take away the lambda functions and write out the actual calculation for the Q-Values the following way:

X = (state_value + (action_advantage - tf.math.reduce_mean(action_advantage, axis=1, keepdims=True)))

The results will be the same but the code might be a bit more readable.

So in total your code would look like that:

X_input = Input(shape=(self.state_size,))
X = X_input
X = Dense(512, input_shape= (self.state_size,), activation="relu")(X_input)
X = Dense(260, activation="relu")(X)
X = Dense(100, activation="relu")(X)
state_value = Dense(1)(X)
action_advantage = Dense(self.action_size)(X)

X = (state_value + (action_advantage - tf.math.reduce_mean(action_advantage, axis=1, keepdims=True)))

model = Model(inputs = X_input, outputs = X)
model.compile(loss="mean_squared_error", optimizer=Adam(lr=self.learning_rate))
return model

Upvotes: 3

Related Questions