jvmunhoz
jvmunhoz

Reputation: 21

How do I use Sagemaker HyperparameterTuner on a SKLearn Estimator?

I'm following an Amazon Sagemaker workshop to try and leverage several of Sagemaker's utilities instead of running everything off a Notebook as I'm currently doing.

The thing is, in the workshop they teach you how to use HyperparameterTuner using the ready-made XGBoost image from AWS, while most of my pipelines are using Scikit-Learn models such as GradientBoostingClassifier or RandomForest, so I'm instantiating an estimator like this following this example file:

sklearn = SKLearn(entry_point="train.py", 
                  framework_version="1.2-1", 
                  instance_type="ml.m5.xlarge", 
                  role=role,
                  hyperparameters=fixed_hyperparameters
)

After that, I am instantiating a HyperparameterTuner job by using the estimator I just created, with ranges for hyperparameters I want to test.

hyperparameters_ranges = {
    "n_estimators": ContinuousParameter(100, 500),
    "learning_rate": ContinuousParameter(1e-2, 1e-1),
    "max_depth": IntegerParameter(2, 5),
    "subsample": ContinuousParameter(0.6, 1),
    "max_df": ContinuousParameter(0.4, 1),
    "max_features": IntegerParameter(5, 25),
    "use_idf": CategoricalParameter([True, False])
}

metric = "validation:f1"

tuner = HyperparameterTuner(
    sklearn,
    metric,
    hyperparameters_ranges,
    max_jobs=2,
    max_parallel_jobs=2
)

My problem is that I haven't found ANY information on how to access the hyperparameters passed in the SKLearn estimator inside the "train.py" file. Nor have I found where do the optimal hyperparameters are stored so I can use them for the final model. Can someone tell if that's even possible, or offer alternatives if there's another easier way to do this?

Upvotes: 1

Views: 120

Answers (1)

Ilya V. Schurov
Ilya V. Schurov

Reputation: 8077

The hyperparameters are passed to train.py via command line arguments, as described in the documentation:

Hyperparameters are passed to your script as arguments and can be retrieved with an argparse.ArgumentParser instance.

Here is an example script from the documentation:

import argparse
import os
import json

if __name__ =='__main__':

    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--batch-size', type=int, default=100)
    parser.add_argument('--learning-rate', type=float, default=0.1)

    # an alternative way to load hyperparameters via SM_HPS environment variable.
    parser.add_argument('--sm-hps', type=json.loads, default=os.environ['SM_HPS'])

    # input data and model directories
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
    parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST'])

    args, _ = parser.parse_known_args()

    # ... load from args.train and args.test, train a model, write model to args.model_dir. 

It is also possible to retrieve them from the environment variables SM_HP_*, provided they does not contain dashes (i.e batch_size and not batch-size).

Upvotes: 0

Related Questions