Reputation: 88
In OpenAI Gym, I would like to know the next states for different actions on the same state. For example, I want to get s_1, s_2 where the dynamics of my environment are:
(s, a_1) -> s_1, (s, a_2) -> s_2
I can not find a method that undoes an action, or shows me the next state without changing the environment. Is there something obvious that I'm missing?
If it helps, I am doing this to differentiate the dynamics and reward for LQR, and using the InvertedPendulum environment.
Upvotes: 3
Views: 1216
Reputation: 1
For Atari you can use clone_full_state and restore_full_state
def true_predict(env,a):
old_state=env.unwrapped.clone_full_state()
state=env.step(a)
env.unwrapped.restore_full_state(old_state)
# old_state=env.unwrapped.clone_state()
# state=env.step(a)
# env.unwrapped.restore_state(old_state)
return state
import gym
env1 = gym.make("Pong-v4")
s = env1.reset()
for i in range(10):
env1.step(0)
pred=true_predict(env1,0)
new=env1.step(0)
(pred[0]-new[0]).mean()
Upvotes: 0
Reputation: 150
Try cloning the env.
from copy import deepcopy
import gym
env1 = gym.make("InvertedPendulum-v1")
s = env.reset()
env2 = deepcopy(env1)
s_1 = env.step(a_1)
s_2 = env.step(a_2)
Upvotes: 0
Reputation: 88
I found a method named set_state that does exactly this. It can be found at: https://github.com/openai/gym/blob/12e8b763d5dcda4962cbd17887d545f0eec6808a/gym/envs/mujoco/mujoco_env.py#L86-L92
Upvotes: 2