Reputation: 89
I am having a hard time understanding the need for what seems like two search space definitions in the same program flow. The tune.Tuner() object takes in a param_space argument, where we can set up the hyperparameter space to look into, however, it can also take in a scheduler.
As an example, I have a HuggingFace transformer setup with a Population Based Training scheduler, with its own hyperparam_mutations, which looks like another hyperparameter space to look into.
import ray
from ray import tune
from ray.tune import CLIReporter
from ray.tune.schedulers import PopulationBasedTraining
num_tune_trials = 3
batch_size = 2
num_labels = 2
model_ckpt = 'imaginary_ckpt'
odel_name = f"{model_ckpt}-finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
def model_init():
return AutoModelForSequenceClassification.from_pretrained(model_ckpt, num_labels=num_labels).to(device)
def training_args():
return TrainingArguments(output_dir=model_name,
num_train_epochs=4,
learning_rate=2e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
evaluation_strategy="epoch",
push_to_hub=False,
log_level="error")
def trainer_hyperparam():
return Trainer(model=model, args=training_args,
compute_metrics=compute_metrics,
train_dataset=data_encoded["train"],
eval_dataset=data_encoded["validation"],
model_init=model_init,
tokenizer=tokenizer)
trainer = trainer_hyperparam()
tune_config = {
"per_device_train_batch_size": batch_size,
"per_device_eval_batch_size": batch_size,
}
scheduler = PopulationBasedTraining(
time_attr="training_iteration",
metric="eval_accuracy",
mode="max",
perturbation_interval=1,
hyperparam_mutations={
"weight_decay": tune.uniform(0.005, 0.02),
"learning_rate": tune.uniform(1e-3, 1e-6),
"per_device_train_batch_size": [4,5,6,7,8,9],
},
)
reporter = CLIReporter(
parameter_columns={
"weight_decay": "w_decay",
"learning_rate": "lr",
"per_device_train_batch_size": "train_bs/gpu",
},
metric_columns=["eval_accuracy", "eval_loss", "epoch", "training_iteration"],
)
trainer.hyperparameter_search(
hp_space=lambda _: tune_config,
backend="ray",
n_trials=num_tune_trials,
resources_per_trial={"cpu": 4, "gpu": 1},
scheduler=scheduler,
keep_checkpoints_num=1,
checkpoint_score_attr="training_iteration",
stop=None,
progress_reporter=reporter,
local_dir="~/ray_results/",
name="tune_transformer_pbt",
)
Upvotes: 1
Views: 401
Reputation: 111
This section in one of the PBT user guides touches on both questions.
In particular, the param_space
is used to get the initial samples, and the hyperparam_mutations
specifies the resample distributions (resampling being one of the possible mutation operations) and determines which parameters actually get mutated. If not specified in param_space, PBT samples from hyperparam_mutations
initially.
If you only want learning rate to be mutated, then that's the only one that should be specified in hyperparam_mutations
.
Upvotes: 2