Reputation: 1
I'm new to gym and I tried to do a simple qlearning programm but for some (weird) reason it won't let me get rid of the rendering part (which is taking forever)...
Here is my programm:
import gymnasium as gym
import numpy as np
env = gym.make("MountainCar-v0", render_mode="human")
LEARNING_RATE = 0.1
DISCOUNT = 0.95
EPISODES = 25000
SHOW_EVERY = 500
DISCRETE_OS_SIZE = [20] * len(env.observation_space.low)
discrete_os_win_size = (env.observation_space.high - env.observation_space.low) / DISCRETE_OS_SIZE
q_table = np.random.uniform(low=-2, high=0, size=(DISCRETE_OS_SIZE + [env.action_space.n]))
def get_discrete_state(state):
discrete_state = (state - env.observation_space.low) / discrete_os_win_size
return tuple(discrete_state.astype(int))
for episode in range(EPISODES):
if episode % SHOW_EVERY == 0:
render = True
else:
render = False
print("Episode:", episode)
discrete_state = get_discrete_state(tuple(env.reset()[0].astype(int)))
done = False
while not done:
action = np.argmax(q_table[discrete_state])
new_state, reward, terminated, truncated, _ = env.step(action)
done = truncated or terminated
new_discrete_state = get_discrete_state(new_state)
# Rendering the episode
# (Even removing this part does not help)
if render:
env.render()
if not done:
# Updating the Q-table
max_future_q = np.max(q_table[new_discrete_state])
current_q = q_table[discrete_state + (action, )]
new_q = (1-LEARNING_RATE)* current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)
q_table[discrete_state + (action, )] = new_q
# If the car made it to the goal
elif new_state[0] >= env.unwrapped.goal_position:
q_table[discrete_state + (action, )] = 0
print("MADE IT ON EPISODE:", episode)
discrete_state = new_discrete_state
env.close()
I tried:
env.render()
part: did not workdiscrete_state
at the start and replace it by the default value (13, 10)
by hand: kinda worked (episodes not rendering but also not the ones when render
is True)Upvotes: 0
Views: 146
Reputation: 98
In Gymnasium Documentation, it says:
By convention, if the render_mode is:
- “human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during
step()
andrender()
doesn’t need to be called. Returns None.
As long as you set the render_mode
as 'human'
, it is inevitable to be rendered every step.
Upvotes: 0