Ray Tune scheduler hyperparam_mutations vs. param_space

Question

I am having a hard time understanding the need for what seems like two search space definitions in the same program flow. The tune.Tuner() object takes in a param_space argument, where we can set up the hyperparameter space to look into, however, it can also take in a scheduler.

As an example, I have a HuggingFace transformer setup with a Population Based Training scheduler, with its own hyperparam_mutations, which looks like another hyperparameter space to look into.

What is the interaction between these two spaces?
If I just want to perturb learning_rate to see its effect on my accuracy, would I put this into the tuner's param_space or into the scheduler's hyperparam_mutations?

import ray
from ray import tune
from ray.tune import CLIReporter
from ray.tune.schedulers import PopulationBasedTraining

num_tune_trials = 3
batch_size = 2
num_labels = 2
model_ckpt = 'imaginary_ckpt'
odel_name = f"{model_ckpt}-finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)

def model_init():
    return AutoModelForSequenceClassification.from_pretrained(model_ckpt, num_labels=num_labels).to(device)

def training_args():
    return TrainingArguments(output_dir=model_name,
                                        num_train_epochs=4,
                                        learning_rate=2e-5,
                                        per_device_train_batch_size=batch_size,
                                        per_device_eval_batch_size=batch_size,
                                        weight_decay=0.01,
                                        evaluation_strategy="epoch",
                                        push_to_hub=False,
                                        log_level="error")

def trainer_hyperparam():
    return Trainer(model=model, args=training_args,
                compute_metrics=compute_metrics,
                train_dataset=data_encoded["train"],
                eval_dataset=data_encoded["validation"],
                model_init=model_init,
                tokenizer=tokenizer)

trainer = trainer_hyperparam()

tune_config = {
    "per_device_train_batch_size": batch_size,
    "per_device_eval_batch_size": batch_size,
}

scheduler = PopulationBasedTraining(
    time_attr="training_iteration",
    metric="eval_accuracy",
    mode="max",
    perturbation_interval=1,
    hyperparam_mutations={
        "weight_decay": tune.uniform(0.005, 0.02),
        "learning_rate": tune.uniform(1e-3, 1e-6),
        "per_device_train_batch_size": [4,5,6,7,8,9],
    },
)

reporter = CLIReporter(
    parameter_columns={
        "weight_decay": "w_decay",
        "learning_rate": "lr",
        "per_device_train_batch_size": "train_bs/gpu",
    },
    metric_columns=["eval_accuracy", "eval_loss", "epoch", "training_iteration"],
)

trainer.hyperparameter_search(
    hp_space=lambda _: tune_config,
    backend="ray",
    n_trials=num_tune_trials,
    resources_per_trial={"cpu": 4, "gpu": 1},
    scheduler=scheduler,
    keep_checkpoints_num=1,
    checkpoint_score_attr="training_iteration",
    stop=None,
    progress_reporter=reporter,
    local_dir="~/ray_results/",
    name="tune_transformer_pbt",
)

Justin Yu · Accepted Answer

This section in one of the PBT user guides touches on both questions.

In particular, the param_space is used to get the initial samples, and the hyperparam_mutations specifies the resample distributions (resampling being one of the possible mutation operations) and determines which parameters actually get mutated. If not specified in param_space, PBT samples from hyperparam_mutations initially.
If you only want learning rate to be mutated, then that's the only one that should be specified in hyperparam_mutations.

Ray Tune scheduler hyperparam_mutations vs. param_space

Answers (1)

Related Questions