Enhancement of Agent Training Q Learning Taxi V3

Question

episode_number = 10000

for i in range(1,episode_number):
    

    state = env.reset()
    
    reward_count = 0
    dropouts = 0
    
    while True:
        
        if random.uniform(0,1) < epsilon:
            action = env.action_space.sample()
        else:
            action = np.argmax(q_table[state])

        next_state, reward, done, _ = env.step(action)
        
        
        old_value = q_table[state, action]  
        next_max = np.max(q_table[next_state]) 
        
        next_value = (1-alpha)*old_value + alpha*(reward + gamma*next_max) 

        q_table[state,action] = next_value
        

        state = next_state
        

        if reward == -10:
            dropouts += 1
            
        if done:
            break
        
        reward_count  += reward
    if i%10 == 0:
        
        dropout_list.append(dropouts)
        reward_list.append(reward_count)
        print("Episode: {}, reward {}, wrong dropout {}".format(i, reward_count,dropouts))

I was required to enhance this code to showcase a comparison of reward and penalties. How it works is, I have to enhance it by making this code display a comparison of rewards earned before training agent and after training agent. The graph plotted must overlap to show comparison but I could not find a way. I have been trying for days but could not find the solution I am looking for. I hope someone can help assist me in this.

If there is a need to create a new code or a separate code then compare the results, please do let me know. thank you.

Enhancement of Agent Training Q Learning Taxi V3

Answers (1)

Related Questions