Reputation: 133
I have the following code of my q_learning algorithm that tries to solve the mountain car environment from gymnasium (formerly gym).
But even after 30.000 episodes, my agent does not learn how to get up the mountain. I don't understand why, since I update my table, I calculate all the values needed, etc.
Way down in the code, I explicitly state that if the agent makes it to the top, it should print something. But that never happens.
import gymnasium as gym
import numpy as np
env = gym.make("MountainCar-v0")
state, _ = env.reset()
lr = .1
gamma = .95
os_size = [20] * len(env.observation_space.high)
win_size = (env.observation_space.high - env.observation_space.low) / os_size
q_table = np.random.uniform(low = -10, high = 0, size=(os_size + [env.action_space.n]))
def get_discrete_state(state):
discrete_state = (state - env.observation_space.low) / os_size
return (tuple(discrete_state.astype(np.int64)))
for episode in range(30000):
discrete_state = get_discrete_state(state)
done = False
while not done:
action = np.argmax(q_table[discrete_state])
new_state, reward, terminated, trunc, _ = env.step(action)
new_discrete_state = get_discrete_state(new_state)
done = terminated or trunc
if not done:
max_future_q = np.max(q_table[new_discrete_state])
current_q = np.max(q_table[new_discrete_state + (action, )])
new_q = (1-lr) * current_q + lr * (current_q * gamma + reward)
q_table[discrete_state+(action, )] = new_q
elif new_state[0] >= env.unwrapped.goal_position:
print(f"made it on ep {episode}")
q_table[discrete_state+(action, )] = 0
discrete_state = new_discrete_state
env.reset()
Upvotes: 0
Views: 35