Q Learning agent taking too many steps to reach goal

Question

I'm currently working on implementing Q-learning for the FrozenLake-v1 environment in OpenAI Gym. However, my agent seems to like taking a lot of unnecessary steps to get to the goal. I've reviewed my code multiple times, but I can't pinpoint the issue.

Here's the code I'm using:

import random
import numpy as np
import gymnasium as gym

def argmax(arr):
    arr_max = np.max(arr)
    return np.random.choice(np.where(arr == arr_max)[0])


def save_q_table(Q):
    np.savetxt("q_table.csv", Q, delimiter=",")


def load_q_table():
    return np.loadtxt("q_table.csv", delimiter=",")


def run(training):
    if not training:
        env = gym.make("FrozenLake-v1", render_mode='human')
    else:
        env = gym.make("FrozenLake-v1")

    Q = np.zeros((env.observation_space.n, env.action_space.n))  # empty q_table

    if not training:
        Q = load_q_table()

    alpha = 0.8
    gamma = 0.95
    episode = 0
    episodes = 10000
    epsilon = 0.95
    epsilon_decay = (2 * epsilon) / episodes
    epsilon_min = 0.05
    env.metadata['render_fps'] = 10

    state, info = env.reset()

    while episode < episodes:

        if random.random() < epsilon and training:
            action = env.action_space.sample()
        else:
            action = argmax(Q[state])

        new_state, reward, terminated, truncated, info = env.step(action)

        if training:
            Q[state, action] = Q[state, action] + alpha * (
                        float(reward) + gamma * np.max(Q[new_state]) - Q[state, action])

        state = new_state

        if terminated or truncated:

            if epsilon > epsilon_min:
                epsilon -= epsilon_decay

            episode += 1

            # save on last episode
            if training and episode == episodes:
                print("Saving Q table")
                save_q_table(Q)

            print("Episode: ", episode, "Epsilon: ", round(epsilon, 2), "Reward: ", reward)

            state, info = env.reset()  # Reset the environment

    env.close()


run(training=False)

I tried lowering the rewards when steps are higher eg remove 0.01 reward each step if the goal is found. I expected that to help with the agent to understand to take less steps but it seemed to do it anyway. Lowering the reward each step even though the goal has not been found seems like an idea but I don't think you can do that considering the reward becomes minus.

Q Learning agent taking too many steps to reach goal

Answers (1)

Related Questions