Reputation: 1006
With stable-baselines3 given an agent, we can call "action = agent.predict(obs)". And then with Gym, this would be "new_obs, reward, done, info = env.step(action)". (more or less, maybe missed an input or an output).
We also have "agent.learn(10_000)" as an example, yet here we're less involved in the process and don't call the environment.
Looking for a way to train the agent while still calling "env.step". If you wander why, just trying to implement self play (agent and a previous version of it) playing with one environment (for example turns play as Chess).
WKR, Oren.
Upvotes: 0
Views: 611
Reputation: 463
But why do you need it? If you take a look at the implementation of any learn
method, you will see it is nothing more than an iteration over time steps calling collect_rollouts
and train
with some additional logging and setup at the beginning for, e.g., further saving the agent etc. Your env.step
is called inside collect_rollouts
.
I'd better suggest you to write a callback based on CheckpointCallback, which saves your agent (model) after N training steps and then attach this callback to your learn
call. In your environment you could instantiate each N steps a "new previous" version of your model by calling ModelClass.load(file)
on the file saved by a callback, so that finally you would be able to select actions of the other player using a self-play in your environment
Upvotes: 0