Observations meaning - OpenAI Gym

I want to know the specification of the observation of CartPole-v0 in OpenAI Gym(https://gym.openai.com/).

For example, in the following code outputs observation. One observation is like [-0.061586 -0.75893141 0.05793238 1.15547541] I want to know what the numbers mean. And I want any way to know the specification of other Environments such as MountainCar-v0, MsPacman-v0 and so on.

I tried to read https://github.com/openai/gym, but I don't know that. Would you tell me the way to know the specifications?

import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        env.render()
        print(observation)
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break

(from https://gym.openai.com/docs)

The output is the following

[-0.061586   -0.75893141  0.05793238  1.15547541]
[-0.07676463 -0.95475889  0.08104189  1.46574644]
[-0.0958598  -1.15077434  0.11035682  1.78260485]
[-0.11887529 -0.95705275  0.14600892  1.5261692 ]
[-0.13801635 -0.7639636   0.1765323   1.28239155]
[-0.15329562 -0.57147373  0.20218013  1.04977545]
Episode finished after 14 timesteps
[-0.02786724  0.00361763 -0.03938967 -0.01611184]
[-0.02779488 -0.19091794 -0.03971191  0.26388759]
[-0.03161324  0.00474768 -0.03443415 -0.04105167]

Upvotes: 9

Answers (2)

RoastDuck

Reputation: 110

The observation space used in OpenAI Gym is not exactly the same with the original paper. Look at OpenAI's wiki to find the answer. The observation space is a 4-D space, and each dimension is as follows:

Num Observation Min Max 0 Cart Position -2.4 2.4 1 Cart Velocity -Inf Inf 2 Pole Angle ~ -41.8° ~ 41.8° 3 Pole Velocity At Tip -Inf Inf

Upvotes: 9

Pablo EM

Reputation: 6689

After the paragraph describing each environment in OpenAI Gym website, you always have a reference that explains in detail the environment, for example, in the case of CartPole-v0 you can find all details in:

[Barto83] AG Barto, RS Sutton and CW Anderson, "Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem", IEEE Transactions on Systems, Man, and Cybernetics, 1983.

In that paper you can read that the cart-pole has four state variables:

position of the cart on the track
angle of the pole with the vertical
cart velocity
rate of change of the angle

So, the observation is simply a vector with the value of the four state variables.

Similarly, the details of the MountainCar-v0 can be found in

[Moore90] A Moore, Efficient Memory-Based Learning for Robot Control, PhD thesis, University of Cambridge, 1990.

and so on.

Upvotes: 4

Observations meaning - OpenAI Gym

Answers (2)

Related Questions