Oren
Oren

Reputation: 1006

stable-baseline3, gym, train while also step/predict

With stable-baselines3 given an agent, we can call "action = agent.predict(obs)". And then with Gym, this would be "new_obs, reward, done, info = env.step(action)". (more or less, maybe missed an input or an output).

We also have "agent.learn(10_000)" as an example, yet here we're less involved in the process and don't call the environment.

Looking for a way to train the agent while still calling "env.step". If you wander why, just trying to implement self play (agent and a previous version of it) playing with one environment (for example turns play as Chess).

WKR, Oren.

Upvotes: 0

Views: 611

Answers (1)

gehirndienst
gehirndienst

Reputation: 463

But why do you need it? If you take a look at the implementation of any learn method, you will see it is nothing more than an iteration over time steps calling collect_rollouts and train with some additional logging and setup at the beginning for, e.g., further saving the agent etc. Your env.step is called inside collect_rollouts.

I'd better suggest you to write a callback based on CheckpointCallback, which saves your agent (model) after N training steps and then attach this callback to your learn call. In your environment you could instantiate each N steps a "new previous" version of your model by calling ModelClass.load(file) on the file saved by a callback, so that finally you would be able to select actions of the other player using a self-play in your environment

Upvotes: 0

Related Questions