Reputation: 387
OK, so, I'm trying to make a intrinsic-curiosity agent using keras and tensorflow. This agent's reward function is the difference of an autoencoder's loss between the previous and current state, and the autoencoder's loss between the current state and an imagined next state. However this reward function always returns None instead of the actual difference. I've tried printing the loss out but it always gives the correct values.
Reward function/replay code:
def replay(self, batch):
minibatch = R.sample(self.memory, batch)
for prev_state, actions, state, reward, imagined_next_state in minibatch:
target = []
imagined_next_state = np.add(np.random.random(self.state_size), imagined_next_state)
target_m = self.model.predict(state)
for i in range(len(target_m)):
target_m[i][0][actions[i]]=reward
history_m = self.model.fit(state, target_m, epochs=1, verbose=0)
history_ae_ps = self.autoencoder.fit(prev_state, state, epochs=1, verbose=0)
history_ae_ns = self.autoencoder.fit(state, imagined_next_state, epochs=1, verbose=0)
loss_m = history_m.history['loss'][-1]
loss_ae_ps = history_ae_ps.history['loss'][-1]
loss_ae_ns = history_ae_ns.history['loss'][-1]
print("LOSS AE PS:", loss_ae_ps)
print("LOSS AE NS:", loss_ae_ns)
loss_ae = loss_ae_ns - loss_ae_ps
print(reward, loss_ae)
return loss_ae
Agent environment loop code:
def loop(self, times='inf'):
if times is 'inf':
times = 2**31
reward = 0.0001
prev_shot = self.get_shot()
for i in range(times):
acts, ins, act_probs, shot = self.get_act()
act_0 = acts[0]
act_1 = acts[1]
act_2 = acts[2]
act_3 = acts[3]
self.act_to_mouse(act_0, act_1)
self.act_to_click(act_2)
self.act_to_keys(act_3)
reward = self.remember_and_replay(prev_shot, acts, shot, reward, ins)
if reward is None:
raise(RewardError("Rewards are none."))
prev_shot = shot
Upvotes: 0
Views: 156
Reputation: 387
I just solved it while typing the question. I simply wasn't returning the reward in the remember_and_replay method...
The remember_and_replay method looked like this:
def remember_and_replay(self, prev_shot, action, shot, reward, ins):
self.dqn.remember(prev_shot, action, shot, reward, ins)
self.dqn.replay(1)
when it should have been like this:
def remember_and_replay(self, prev_shot, action, shot, reward, ins):
self.dqn.remember(prev_shot, action, shot, reward, ins)
rew = self.dqn.replay(1)
return rew
Hope I helped anyone else. :)
Upvotes: 0