How do I use Sagemaker HyperparameterTuner on a SKLearn Estimator?

Question

I'm following an Amazon Sagemaker workshop to try and leverage several of Sagemaker's utilities instead of running everything off a Notebook as I'm currently doing.

The thing is, in the workshop they teach you how to use HyperparameterTuner using the ready-made XGBoost image from AWS, while most of my pipelines are using Scikit-Learn models such as GradientBoostingClassifier or RandomForest, so I'm instantiating an estimator like this following this example file:

sklearn = SKLearn(entry_point="train.py", 
                  framework_version="1.2-1", 
                  instance_type="ml.m5.xlarge", 
                  role=role,
                  hyperparameters=fixed_hyperparameters
)

After that, I am instantiating a HyperparameterTuner job by using the estimator I just created, with ranges for hyperparameters I want to test.

hyperparameters_ranges = {
    "n_estimators": ContinuousParameter(100, 500),
    "learning_rate": ContinuousParameter(1e-2, 1e-1),
    "max_depth": IntegerParameter(2, 5),
    "subsample": ContinuousParameter(0.6, 1),
    "max_df": ContinuousParameter(0.4, 1),
    "max_features": IntegerParameter(5, 25),
    "use_idf": CategoricalParameter([True, False])
}

metric = "validation:f1"

tuner = HyperparameterTuner(
    sklearn,
    metric,
    hyperparameters_ranges,
    max_jobs=2,
    max_parallel_jobs=2
)

My problem is that I haven't found ANY information on how to access the hyperparameters passed in the SKLearn estimator inside the "train.py" file. Nor have I found where do the optimal hyperparameters are stored so I can use them for the final model. Can someone tell if that's even possible, or offer alternatives if there's another easier way to do this?

How do I use Sagemaker HyperparameterTuner on a SKLearn Estimator?

Answers (1)

Related Questions