Reputation: 1
episode_number = 10000
for i in range(1,episode_number):
state = env.reset()
reward_count = 0
dropouts = 0
while True:
if random.uniform(0,1) < epsilon:
action = env.action_space.sample()
else:
action = np.argmax(q_table[state])
next_state, reward, done, _ = env.step(action)
old_value = q_table[state, action]
next_max = np.max(q_table[next_state])
next_value = (1-alpha)*old_value + alpha*(reward + gamma*next_max)
q_table[state,action] = next_value
state = next_state
if reward == -10:
dropouts += 1
if done:
break
reward_count += reward
if i%10 == 0:
dropout_list.append(dropouts)
reward_list.append(reward_count)
print("Episode: {}, reward {}, wrong dropout {}".format(i, reward_count,dropouts))
I was required to enhance this code to showcase a comparison of reward and penalties. How it works is, I have to enhance it by making this code display a comparison of rewards earned before training agent and after training agent. The graph plotted must overlap to show comparison but I could not find a way. I have been trying for days but could not find the solution I am looking for. I hope someone can help assist me in this.
If there is a need to create a new code or a separate code then compare the results, please do let me know. thank you.
Upvotes: 0
Views: 258
Reputation: 1
I think there is a missing term in the affectation of next_value It should be next_value = (1-alpha)old_value + alpha(reward + gamma*next_max- q_table(state,action))
Regarding the plots you want to make, you can interactively plot the rewards earned by an agent taking random actions simultaneously with the rewards taken by your agent after reinforcement learning
I doesn't seem that understood but the code you are showing is the learning phase of the agent
After you run it q_table contains the quality of each action in regard to the current state
The algorithm for the progression of the agent is then
initialize environment
done := false
while not done
s:= current state
a := argmax(q_table[s])
update s and done by making the action a
I suggest you check this tutorial that covers all of your interrogations I think
Feel free to check the comment section of the post for the concerns regarding the plots
I hope I have been helpful
Good luck in your work!
Upvotes: 0