Reputation: 2167
I want to know the specification of the observation of CartPole-v0
in OpenAI Gym(https://gym.openai.com/).
For example, in the following code outputs observation
. One observation is like [-0.061586 -0.75893141 0.05793238 1.15547541]
I want to know what the numbers mean. And I want any way to know the specification of other Environments
such as MountainCar-v0
, MsPacman-v0
and so on.
I tried to read https://github.com/openai/gym, but I don't know that. Would you tell me the way to know the specifications?
import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
observation = env.reset()
for t in range(100):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t+1))
break
(from https://gym.openai.com/docs)
The output is the following
[-0.061586 -0.75893141 0.05793238 1.15547541]
[-0.07676463 -0.95475889 0.08104189 1.46574644]
[-0.0958598 -1.15077434 0.11035682 1.78260485]
[-0.11887529 -0.95705275 0.14600892 1.5261692 ]
[-0.13801635 -0.7639636 0.1765323 1.28239155]
[-0.15329562 -0.57147373 0.20218013 1.04977545]
Episode finished after 14 timesteps
[-0.02786724 0.00361763 -0.03938967 -0.01611184]
[-0.02779488 -0.19091794 -0.03971191 0.26388759]
[-0.03161324 0.00474768 -0.03443415 -0.04105167]
Upvotes: 9
Views: 8913
Reputation: 110
The observation space used in OpenAI Gym is not exactly the same with the original paper. Look at OpenAI's wiki to find the answer. The observation space is a 4-D space, and each dimension is as follows:
Num Observation Min Max
0 Cart Position -2.4 2.4
1 Cart Velocity -Inf Inf
2 Pole Angle ~ -41.8° ~ 41.8°
3 Pole Velocity At Tip -Inf Inf
Upvotes: 9
Reputation: 6689
After the paragraph describing each environment in OpenAI Gym website, you always have a reference that explains in detail the environment, for example, in the case of CartPole-v0
you can find all details in:
[Barto83] AG Barto, RS Sutton and CW Anderson, "Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem", IEEE Transactions on Systems, Man, and Cybernetics, 1983.
In that paper you can read that the cart-pole has four state variables:
So, the observation
is simply a vector with the value of the four state variables.
Similarly, the details of the MountainCar-v0
can be found in
[Moore90] A Moore, Efficient Memory-Based Learning for Robot Control, PhD thesis, University of Cambridge, 1990.
and so on.
Upvotes: 4