Reputation: 7608
Although I can manage to get the examples and my own code to run, I am more curious about the real semantics / expectations behind OpenAI gym API, in particular Env.reset()
When is reset expected/required? At the end of each episode? Or only after creating an environment?
I rather think it makes sense before each episode but I have not been able to read that explicitly!
Upvotes: 5
Views: 7369
Reputation: 127
Thing simply by using env.reset()
it just reset whole things so you need to reset each episode
This is example for reset function inside a custom environment. It just reset the enemy position and time in this case
I guess you got better understanding by showing what is inside environment
Sorry for late response
Upvotes: 0
Reputation: 1048
You typically use reset after an entire episode. So that could be after you reached a terminal state in the mdp, or after you reached you maximum amount of time steps (set by you). I also typically reset it at the very start of training as well.
So if you are at your starting state 'A' and you want to reach state 'Z', you would run your time steps going from 'A' -> 'B' -> 'C' ..., then when you reach the terminal state 'Z', you start a new episode using reset, which would take you back to 'A'.
for episode in range(iterations):
state = env.reset() // first state
for time_step in range(1000): //max amount of iterations
action = take_action(state)
state, reward, done, _ = env.step(action)
if done:
break // takes you to the next episode where the environment is reset
Upvotes: 6