Reputation: 38
I'm currently working on implementing Q-learning for the FrozenLake-v1 environment in OpenAI Gym. However, my agent seems to like taking a lot of unnecessary steps to get to the goal. I've reviewed my code multiple times, but I can't pinpoint the issue.
Here's the code I'm using:
import random
import numpy as np
import gymnasium as gym
def argmax(arr):
arr_max = np.max(arr)
return np.random.choice(np.where(arr == arr_max)[0])
def save_q_table(Q):
np.savetxt("q_table.csv", Q, delimiter=",")
def load_q_table():
return np.loadtxt("q_table.csv", delimiter=",")
def run(training):
if not training:
env = gym.make("FrozenLake-v1", render_mode='human')
else:
env = gym.make("FrozenLake-v1")
Q = np.zeros((env.observation_space.n, env.action_space.n)) # empty q_table
if not training:
Q = load_q_table()
alpha = 0.8
gamma = 0.95
episode = 0
episodes = 10000
epsilon = 0.95
epsilon_decay = (2 * epsilon) / episodes
epsilon_min = 0.05
env.metadata['render_fps'] = 10
state, info = env.reset()
while episode < episodes:
if random.random() < epsilon and training:
action = env.action_space.sample()
else:
action = argmax(Q[state])
new_state, reward, terminated, truncated, info = env.step(action)
if training:
Q[state, action] = Q[state, action] + alpha * (
float(reward) + gamma * np.max(Q[new_state]) - Q[state, action])
state = new_state
if terminated or truncated:
if epsilon > epsilon_min:
epsilon -= epsilon_decay
episode += 1
# save on last episode
if training and episode == episodes:
print("Saving Q table")
save_q_table(Q)
print("Episode: ", episode, "Epsilon: ", round(epsilon, 2), "Reward: ", reward)
state, info = env.reset() # Reset the environment
env.close()
run(training=False)
I tried lowering the rewards when steps are higher eg remove 0.01 reward each step if the goal is found. I expected that to help with the agent to understand to take less steps but it seemed to do it anyway. Lowering the reward each step even though the goal has not been found seems like an idea but I don't think you can do that considering the reward becomes minus.
Upvotes: 0
Views: 119
Reputation: 98
Isn't it the problem of is_slippery
?
is_slippery
is set to True
by default, which makes the undesired action with probability of 2/3.
See the description:
The lake is slippery (unless disabled) so the player may move perpendicular to the intended direction sometimes (see
is_slippery
).
You can turn off this by setting:
env = gym.make("FrozenLake-v1", render_mode='human', is_slippery=False)
or
env = gym.make("FrozenLake-v1", render_mode='human')
Upvotes: 0