Reputation: 3

How to I specify model.learn() to end within a certain episodes of stable baselines 3?

I know specifying that total_timesteps= is a require parameter, but how to I end model.learn() within a certain episodes? Forgive me for I'm still new to stables_baselines3 and pytorch still not how to implement it in code.

import gym
import numpy as np
from stable_baselines3 import DDPG
from stable_baselines3.common.noise import NormalActionNoise

env = gym.make('NeuralTraffic-v1')

n_actions = env.action_space.shape[-1]
action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))
model = DDPG("MlpPolicy", env, action_noise=action_noise, verbose=1)
model.learn(total_timesteps=60, log_interval=1)
model.save("ddpg")
env = model.get_env()

I wanted to ended the episode on 60 instead my rollout was:

----------------------------------
| rollout/           |           |
|    ep_len_mean     | 94        |
|    ep_rew_mean     | -2.36e+04 |
| time/              |           |
|    episodes        | 1         |
|    fps             | 0         |
|    time_elapsed    | 452       |
|    total_timesteps | 94        |
----------------------------------

I don't understand why is it only 1 episode? I'd like to learn how to implement to restrict learning to specified episodes.

Upvotes: 0

Answers (2)

YamaByte

Reputation: 11

A little late to the party, but hopefully it helps others visiting this page.

Based on your ep_len_mean variable, each episode of your environment consists of 94 steps, before terminating.

Setting your total_timesteps at 60 means that the learning algorithm will only run env.step() 60 times before halting the training process, which is shy of 1 full episode (94 steps).

To achieve your desired 60 episodes instead of steps, you can simply take 94 (steps per episode) x 60 (episodes desired) = 5640 (total steps required), which will be your total_timesteps parameter.

Upvotes: 1

Satya Prakash Dash

Reputation: 1216

Generic Box-2D and classic control environments have 1000 timesteps within one episode but this is not constant as the agent can do some weird thing in the beginning and the environment can reset itself (resulting in uneven timesteps per episode). So it's the norm to keep a specific timestep in mind while benchmarking (1e6 in most research papers in model-free RL) on contrary to specifying a certain number of episodes. As you can see in SB3 Docs DDPG.learn method that they don't provide a specific argument to set the number of episodes and it is actually best to keep a specific number of timesteps in mind. I see that you have written 60 in place of total_timesteps. It's way too little to train an RL agent. Try keeping something like 1e5 or 1e6 and you might see good results. Good Luck!

Upvotes: 1

How to I specify model.learn() to end within a certain episodes of stable baselines 3?

Answers (2)

Related Questions