Reputation: 1
I launched a hyperopt algorithm on a custom gym environment.
this is my code :
config = {
"env": "affecta",
"sgd_minibatch_size": 1000,
"num_sgd_iter": 100,
"lr": tune.uniform(5e-6, 5e-2),
"lambda": tune.uniform(0.6, 0.99),
"vf_loss_coeff": tune.uniform(0.6, 0.99),
"kl_target": tune.uniform(0.001, 0.01),
"kl_coeff": tune.uniform(0.5, 0.99),
"entropy_coeff": tune.uniform(0.001, 0.01),
"clip_param": tune.uniform(0.4, 0.99),
"train_batch_size": 200, # taille de l'épisode
# "monitor": True,
# "model": {"free_log_std": True},
"num_workers": 6,
"num_gpus": 0,
# "rollout_fragment_length":3
# "batch_mode": "complete_episodes"
}
current_best_params = [{
'lr': 5e-4,
}]
config = explore(config)
optimizer = HyperOptSearch(metric="episode_reward_mean", mode="max", n_initial_points=20, random_state_seed=7, space=config)
# optimizer = ConcurrencyLimiter(optimizer, max_concurrent=4)
tuner = tune.Tuner(
"PPO",
tune_config=tune.TuneConfig(
# metric="episode_reward_mean", # the metric we want to study
# mode="max", # maximize the metric
search_alg=optimizer,
# num_samples will repeat the entire config 'num_samples' times == Number of trials dans l'output 'Status'
num_samples=10,
),
run_config=air.RunConfig(stop={"training_iteration": 3}, local_dir="test_avec_inoffensifs"),
# limite le nombre d'épisode pour chaque croisement d'hyperparamètres
)
results = tuner.fit()
The problem is that the dataframes returned at each iteration of the hyperopt algorithm contain nan values for rewards... I tried using several environments, and it is still the same.
Thank you by advance :)
Upvotes: 0
Views: 192
Reputation: 48
The returned rewards are independent HP optimization algorithm.
If the train_batch_size
is 200 but you have tiny rollout fragment lengths, you probably run into an issue related to num_workers*rollout_fragment_length
only being 18. So you collect very few samples (18!) on every iteration, train on them, but there is never a full episode to calculate the mean reward from, even after three iterations.
Collecting complete episodes, a larger rollout_fragment_length
and/or a lower train_batch_size
should do the trick.
Upvotes: 0