ZeroMaxinumXZ
ZeroMaxinumXZ

Reputation: 387

Why is my reward function returning None in Python?

OK, so, I'm trying to make a intrinsic-curiosity agent using keras and tensorflow. This agent's reward function is the difference of an autoencoder's loss between the previous and current state, and the autoencoder's loss between the current state and an imagined next state. However this reward function always returns None instead of the actual difference. I've tried printing the loss out but it always gives the correct values.

Reward function/replay code:

    def replay(self, batch):
        minibatch = R.sample(self.memory, batch)

        for prev_state, actions, state, reward, imagined_next_state in minibatch:
            target = []

            imagined_next_state = np.add(np.random.random(self.state_size), imagined_next_state)
            target_m = self.model.predict(state)
            for i in range(len(target_m)):
                target_m[i][0][actions[i]]=reward

            history_m = self.model.fit(state, target_m, epochs=1, verbose=0)
            history_ae_ps = self.autoencoder.fit(prev_state, state, epochs=1, verbose=0)
            history_ae_ns = self.autoencoder.fit(state, imagined_next_state, epochs=1, verbose=0)

            loss_m = history_m.history['loss'][-1]
            loss_ae_ps = history_ae_ps.history['loss'][-1]
            loss_ae_ns = history_ae_ns.history['loss'][-1]
            print("LOSS AE PS:", loss_ae_ps)
            print("LOSS AE NS:", loss_ae_ns)

            loss_ae = loss_ae_ns - loss_ae_ps
            print(reward, loss_ae)
            return loss_ae

Agent environment loop code:

    def loop(self, times='inf'):
        if times is 'inf':
            times = 2**31

        reward = 0.0001
        prev_shot = self.get_shot()

        for i in range(times):
            acts, ins, act_probs, shot = self.get_act()

            act_0 = acts[0]
            act_1 = acts[1]
            act_2 = acts[2]
            act_3 = acts[3]

            self.act_to_mouse(act_0, act_1)
            self.act_to_click(act_2)
            self.act_to_keys(act_3)

            reward = self.remember_and_replay(prev_shot, acts, shot, reward, ins)
            if reward is None:
                raise(RewardError("Rewards are none."))
            prev_shot = shot

Upvotes: 0

Views: 156

Answers (1)

ZeroMaxinumXZ
ZeroMaxinumXZ

Reputation: 387

I just solved it while typing the question. I simply wasn't returning the reward in the remember_and_replay method...

The remember_and_replay method looked like this:

def remember_and_replay(self, prev_shot, action, shot, reward, ins):
        self.dqn.remember(prev_shot, action, shot, reward, ins)
        self.dqn.replay(1)

when it should have been like this:

def remember_and_replay(self, prev_shot, action, shot, reward, ins):
        self.dqn.remember(prev_shot, action, shot, reward, ins)
        rew = self.dqn.replay(1)
        return rew

Hope I helped anyone else. :)

Upvotes: 0

Related Questions