Jelles
Jelles

Reputation: 38

Q Learning agent taking too many steps to reach goal

I'm currently working on implementing Q-learning for the FrozenLake-v1 environment in OpenAI Gym. However, my agent seems to like taking a lot of unnecessary steps to get to the goal. I've reviewed my code multiple times, but I can't pinpoint the issue.

Here's the code I'm using:

import random
import numpy as np
import gymnasium as gym

def argmax(arr):
    arr_max = np.max(arr)
    return np.random.choice(np.where(arr == arr_max)[0])


def save_q_table(Q):
    np.savetxt("q_table.csv", Q, delimiter=",")


def load_q_table():
    return np.loadtxt("q_table.csv", delimiter=",")


def run(training):
    if not training:
        env = gym.make("FrozenLake-v1", render_mode='human')
    else:
        env = gym.make("FrozenLake-v1")

    Q = np.zeros((env.observation_space.n, env.action_space.n))  # empty q_table

    if not training:
        Q = load_q_table()

    alpha = 0.8
    gamma = 0.95
    episode = 0
    episodes = 10000
    epsilon = 0.95
    epsilon_decay = (2 * epsilon) / episodes
    epsilon_min = 0.05
    env.metadata['render_fps'] = 10

    state, info = env.reset()

    while episode < episodes:

        if random.random() < epsilon and training:
            action = env.action_space.sample()
        else:
            action = argmax(Q[state])

        new_state, reward, terminated, truncated, info = env.step(action)

        if training:
            Q[state, action] = Q[state, action] + alpha * (
                        float(reward) + gamma * np.max(Q[new_state]) - Q[state, action])

        state = new_state

        if terminated or truncated:

            if epsilon > epsilon_min:
                epsilon -= epsilon_decay

            episode += 1

            # save on last episode
            if training and episode == episodes:
                print("Saving Q table")
                save_q_table(Q)

            print("Episode: ", episode, "Epsilon: ", round(epsilon, 2), "Reward: ", reward)

            state, info = env.reset()  # Reset the environment

    env.close()


run(training=False)

I tried lowering the rewards when steps are higher eg remove 0.01 reward each step if the goal is found. I expected that to help with the agent to understand to take less steps but it seemed to do it anyway. Lowering the reward each step even though the goal has not been found seems like an idea but I don't think you can do that considering the reward becomes minus.

Upvotes: 0

Views: 119

Answers (1)

HyeAnn
HyeAnn

Reputation: 98

Isn't it the problem of is_slippery? is_slippery is set to True by default, which makes the undesired action with probability of 2/3. See the description:

The lake is slippery (unless disabled) so the player may move perpendicular to the intended direction sometimes (see is_slippery).

You can turn off this by setting:

env = gym.make("FrozenLake-v1", render_mode='human', is_slippery=False)

or

env = gym.make("FrozenLake-v1", render_mode='human')

Upvotes: 0

Related Questions