Xgboost Amazon Sagemaker grid search alternative

Question

I am using Amazon Sagemaker to run a xgboost model in order to bet the best hyperparameter combination. I have to use the sagemaker implementation and not the notebook alternative to test if it runs faster than a gridsearch. My problem is how can I make this work in a loop. Any Ideas? My understanding is that I have to code numerous jobs with different combinations. I tried this as a test:

for i in range (1,3):
    for j in range (13,15):
        job_name = 'regression' + '-'+str(i) +"-"+str(j)+"-" +strftime("%Y-%m-%d-%H-%M-%S", gmtime())

        job_name_params = copy.deepcopy(parameters_xgboost)
        job_name_params['TrainingJobName'] = job_name
        job_name_params['OutputDataConfig']['S3OutputPath']= "....."
        job_name_params['HyperParameters']['objective'] = "reg:linear"
        job_name_params['HyperParameters']['silent'] = "0"
        job_name_params['HyperParameters']['max_depth'] = str(i)
        job_name_params['HyperParameters']['min_child_weight'] = str(j)
        job_name_params['HyperParameters']['eta'] = "0.01"
        job_name_params['HyperParameters']['num_round'] = "1000"
        job_name_params['HyperParameters']['subsample'] = "0.5"
        job_name_params['HyperParameters']['colsample_bytree'] = "0.5"

        sm = boto3.Session().client('.....')


        sm.create_training_job(**job_name_params)
        sm.get_waiter('training_job_completed_or_stopped').wait(TrainingJobName=job_name)
        status = sm.describe_training_job(TrainingJobName=job_name)['TrainingJobStatus']
        print("Training job ended with status: " + status)

parameters_xgboost is how Sagemaker reads basic info and list of hyper params.

The good thing is that it works. The bad this is that this trains the models one at a time. I would like all of these combinations to run a the same time. How can I do that?

Xgboost Amazon Sagemaker grid search alternative

Answers (1)

Related Questions