OpenAI gym: when is reset required?

Question

Although I can manage to get the examples and my own code to run, I am more curious about the real semantics / expectations behind OpenAI gym API, in particular Env.reset()

When is reset expected/required? At the end of each episode? Or only after creating an environment?

I rather think it makes sense before each episode but I have not been able to read that explicitly!

Derek_M · Accepted Answer

You typically use reset after an entire episode. So that could be after you reached a terminal state in the mdp, or after you reached you maximum amount of time steps (set by you). I also typically reset it at the very start of training as well.

So if you are at your starting state 'A' and you want to reach state 'Z', you would run your time steps going from 'A' -> 'B' -> 'C' ..., then when you reach the terminal state 'Z', you start a new episode using reset, which would take you back to 'A'.

    for episode in range(iterations):
        state = env.reset() // first state
        for time_step in range(1000):  //max amount of iterations
            action = take_action(state)
            state, reward, done, _ = env.step(action)
            if done:
                break // takes you to the next episode where the environment is reset

OpenAI gym: when is reset required?

Answers (2)

Related Questions