stable-baseline3, gym, train while also step/predict

Question

With stable-baselines3 given an agent, we can call "action = agent.predict(obs)". And then with Gym, this would be "new_obs, reward, done, info = env.step(action)". (more or less, maybe missed an input or an output).

We also have "agent.learn(10_000)" as an example, yet here we're less involved in the process and don't call the environment.

Looking for a way to train the agent while still calling "env.step". If you wander why, just trying to implement self play (agent and a previous version of it) playing with one environment (for example turns play as Chess).

WKR, Oren.

stable-baseline3, gym, train while also step/predict

Answers (1)

Related Questions