Reputation: 21
I'm working on a Q-learning project using OpenAI Gym and PyBullet drones. My goal is to control the height of the drone so that it stays at a height of 1 and remains stable at that point. I'm using discrete actions 0, 1, and 2, which correspond to setting the drone's altitude to [0 0 0 0], [1 1 1 1], and [-1 -1 -1 -1] respectively. Initially, I tried setting the reward as 'reward = (1 - (next_state))**2
', but I noticed that the reward and the drone's altitude were inversely proportional, meaning that as the drone descended, the reward increased. When I didn't add any reward function, the drone stayed at a height of 1.5 instead of 1.
the system reward function:
'def _computeReward(self):
state = self._getDroneStateVector(0)
ret = max(0, 2 - np.linalg.norm(self.TARGET_POS(state[0:3]))**4)
return ret'
[enter image description here](https://i.sstatic.net/AB4xE48J.png)
here my get_action() function:
`def get_action(q_values, epsilon):
if random.random() > epsilon:
act = np.argmax(q_values.numpy()[0])
return act
else:
act = random.choice(np.arange(3))
return act`
and my training loops:
`for i in range(num_episodes):
state = env.reset()
state = state[0][0][2]
total_points = 0
for t in range(max_num_timesteps):
state_qn = np.expand_dims(state, axis=0)
q_values = q_network(state_qn)
action = utils.get_action(q_values, epsilon)
a = np.array([[-1, -1, -1, -1]])
action = action + a
action = action.reshape(1, -1)
next_state, reward, done, info, _ = env.step(action)
next_state = next_state[0][2]
#reward = (1 - (next_state))**2
memory_buffer.append(experience(state, action, reward, next_state, done))
update = utils.check_update_conditions(t, NUM_STEPS_FOR_UPDATE, memory_buffer)
if update:
experiences = utils.get_experiences(memory_buffer)
agent_learn(experiences, GAMMA)
state = next_state.copy()
total_points += reward`
What kind of reward function can I define to correct this situation? Or what can I do?
Upvotes: 1
Views: 102