Reputation: 136389
I've created a very simple OpenAI gym (banana-gym
) and wonder if / how I should implement env.seed(0)
.
See https://github.com/openai/gym/issues/250#issuecomment-234126816 for example.
Upvotes: 10
Views: 25505
Reputation: 1366
env.seed(seed) works like a charm. The key is to seed the env not just at the beginning, but EVERY time the reset() function is called. Since we invariably end up playing multiple games during training this seeding function should be inside one of the loops and will be executed multiple times. Possibly this is the reason why it is deprecated now in favor of env.reset(seed=seed)
Of course needless to say, if you are using randomness in the agent, you need to seed that as well. In that case, seeding once at the start of the training would be fine. You may want to seed the NN framework as well.. A typical seeding function (Pytorch) would be:
def seed_everything(seed):
random.seed(seed)
np.random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
env.seed(seed)
##One call at beginning is enough
seed_everything(SEED)
Howver remember to call env.seed every time you reset the env:
curr_state = env.reset()
env.seed(SEED)
or simply use the new API: env.reset(seed=seed)
BUT - While this determinism may be used in early training to debug your code, it is recommended not to use the the same(ie env.seed(SEED)) in your final training. This is because, by nature, the start position of the environment is supposed to be random and your RL code is expected to function considering that randomness. If you make your start position deterministic then your model will not perform well in the live environment
Upvotes: 4
Reputation: 96
In a recent merge, the developers of OpenAI gym changed the behavior of env.seed()
to not call the method env._seed()
anymore. Instead the method now just issues a warning and returns. I think if you want to use this method to set the seed of your environment, you should just overwrite it now.
Upvotes: 4
Reputation: 8488
The docstring of the env.seed()
function (which can be found in this file) provides the following documentation on what the function should be implemented to do:
Sets the seed for this env's random number generator(s).
Note:
Some environments use multiple pseudorandom number generators.
We want to capture all such seeds used in order to ensure that
there aren't accidental correlations between multiple generators.
Returns:
list<bigint>: Returns the list of seeds used in this env's random
number generators. The first value in the list should be the
"main" seed, or the value which a reproducer should pass to
'seed'. Often, the main seed equals the provided 'seed', but
this won't be true if seed=None, for example.
Note that, unlike what the documentation and the comments in the issue you linked to seem to imply, it doesn't seem (to me) like env.seed()
is supposed to be overridden by custom environments. env.seed()
has a very simple implementation, where it only calls and returns the return value of env._seed()
, and it seems to me like that is the function which should be overridden by custom environments.
For example, OpenAI gym's atari environments have a custom _seed()
implementation which sets the seed used internally by the (C++
-based) Arcade Learning Environment.
Since you have a random.random()
call in your custom environment, you should probably implement _seed()
to call random.seed()
. In that way, users of your environments can reproduce experiments by making sure to call seed()
on your environment with the same argument.
Note: Messing around with the global random seed like this may be unexpected though, it may be better to create a dedicated random object when your environment gets initialized, seed that object, and make sure to always obtain your random numbers if you need them in the environment from that object.
Upvotes: 2