Reputation: 21
I'm following an Amazon Sagemaker workshop to try and leverage several of Sagemaker's utilities instead of running everything off a Notebook as I'm currently doing.
The thing is, in the workshop they teach you how to use HyperparameterTuner using the ready-made XGBoost image from AWS, while most of my pipelines are using Scikit-Learn models such as GradientBoostingClassifier or RandomForest, so I'm instantiating an estimator like this following this example file:
sklearn = SKLearn(entry_point="train.py",
framework_version="1.2-1",
instance_type="ml.m5.xlarge",
role=role,
hyperparameters=fixed_hyperparameters
)
After that, I am instantiating a HyperparameterTuner job by using the estimator I just created, with ranges for hyperparameters I want to test.
hyperparameters_ranges = {
"n_estimators": ContinuousParameter(100, 500),
"learning_rate": ContinuousParameter(1e-2, 1e-1),
"max_depth": IntegerParameter(2, 5),
"subsample": ContinuousParameter(0.6, 1),
"max_df": ContinuousParameter(0.4, 1),
"max_features": IntegerParameter(5, 25),
"use_idf": CategoricalParameter([True, False])
}
metric = "validation:f1"
tuner = HyperparameterTuner(
sklearn,
metric,
hyperparameters_ranges,
max_jobs=2,
max_parallel_jobs=2
)
My problem is that I haven't found ANY information on how to access the hyperparameters passed in the SKLearn estimator inside the "train.py" file. Nor have I found where do the optimal hyperparameters are stored so I can use them for the final model. Can someone tell if that's even possible, or offer alternatives if there's another easier way to do this?
Upvotes: 1
Views: 120
Reputation: 8077
The hyperparameters are passed to train.py via command line arguments, as described in the documentation:
Hyperparameters are passed to your script as arguments and can be retrieved with an
argparse.ArgumentParser
instance.
Here is an example script from the documentation:
import argparse
import os
import json
if __name__ =='__main__':
parser = argparse.ArgumentParser()
# hyperparameters sent by the client are passed as command-line arguments to the script.
parser.add_argument('--epochs', type=int, default=10)
parser.add_argument('--batch-size', type=int, default=100)
parser.add_argument('--learning-rate', type=float, default=0.1)
# an alternative way to load hyperparameters via SM_HPS environment variable.
parser.add_argument('--sm-hps', type=json.loads, default=os.environ['SM_HPS'])
# input data and model directories
parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST'])
args, _ = parser.parse_known_args()
# ... load from args.train and args.test, train a model, write model to args.model_dir.
It is also possible to retrieve them from the environment variables SM_HP_*
, provided they does not contain dashes (i.e batch_size
and not batch-size
).
Upvotes: 0