Max
Max

Reputation: 133

q_learning agent with q_table does not learn

I have the following code of my q_learning algorithm that tries to solve the mountain car environment from gymnasium (formerly gym).

But even after 30.000 episodes, my agent does not learn how to get up the mountain. I don't understand why, since I update my table, I calculate all the values needed, etc.

Way down in the code, I explicitly state that if the agent makes it to the top, it should print something. But that never happens.

import gymnasium as gym
import numpy as np

env = gym.make("MountainCar-v0")
state, _ = env.reset()

lr = .1
gamma = .95

os_size = [20] * len(env.observation_space.high)
win_size = (env.observation_space.high - env.observation_space.low) / os_size

q_table = np.random.uniform(low = -10, high = 0, size=(os_size + [env.action_space.n]))

def get_discrete_state(state):
    
    discrete_state = (state - env.observation_space.low) / os_size
    return (tuple(discrete_state.astype(np.int64)))

for episode in range(30000):
    
    discrete_state = get_discrete_state(state)

    done = False

    while not done:
        
        action = np.argmax(q_table[discrete_state])
        new_state, reward, terminated, trunc, _ = env.step(action)
        new_discrete_state = get_discrete_state(new_state)
        
        done = terminated or trunc
        
        if not done:
            max_future_q = np.max(q_table[new_discrete_state])
            current_q = np.max(q_table[new_discrete_state + (action, )])
            new_q = (1-lr) * current_q + lr * (current_q * gamma + reward)
            q_table[discrete_state+(action, )] = new_q
            
        elif new_state[0] >= env.unwrapped.goal_position:
            print(f"made it on ep {episode}")
            q_table[discrete_state+(action, )] = 0
            
        discrete_state = new_discrete_state
        
    env.reset()

Upvotes: 0

Views: 35

Answers (0)

Related Questions