Alex
Alex

Reputation: 320

Sagemaker Optimize Batch Transform Time for Built In Algorithm

I've got an XGBoost model trained leveraging Sagemaker Hyperparameter Tuning job. Now, I want to generate predictions for about 182GB of Csv files. I've been testing different combinations of instance_types, counts, MaxPayloadInMB, and MaxConcurrentTransforms but haven't been able to run this fast than about 30 minutes... I wanted to see if I'm missing anything to speed this up? Here is my current boto3 call:

response = client.create_transform_job(
  TransformJobName=transform_name,
  ModelName=model_name,
  BatchStrategy='MultiRecord',
  TransformInput={
    "DataSource": {
      "S3DataSource":{
        "S3DataType": "S3Prefix",
        "S3Uri": f"s3://{bucket}/{prefix}/csv_prediction"
      }
    },
    "ContentType": "text/csv",
    "CompressionType": "None",
    "SplitType": "Line"
  },
  MaxPayloadInMB=1,
  MaxConcurrentTransforms=100,
  DataProcessing={
    "InputFilter": "$[1:]",  # Use all columns except first (containing ID)
    "JoinSource": "Input",
    "OutputFilter": "$[0,-1]"  # Return ID and Prediction only 
  },
  TransformOutput={
    "S3OutputPath": f"s3://{bucket}/{prefix}/batch_transform_results/{model_name}",
    "Accept": "text/csv",
    "AssembleWith": "Line"
  },
  TransformResources={
    "InstanceType": "ml.c5.xlarge",
    "InstanceCount": 16
  }
)

Upvotes: 0

Views: 662

Answers (2)

Yann Stoneman
Yann Stoneman

Reputation: 1208

Sometimes using a larger instance will not only be faster, but also more cost-effective. Because if the job finishes much faster, the overall cost may be less, even though the instance is more expensive.

With that said, have you considered using something larger than an xlarge? That's the third smallest compute-optimized instance type. You can go all the way up to 24xlarge with the c5 instance type, with 5 other sizes in-between. Plus, there's a newer generation, c6g, of Graviton based instances.

However, XGBoost is a memory-bound, not compute-bound algorithm. So, a general-purpose compute instance (for example, M5) is a better choice than a compute-optimized instance (for example, C5).

Have you tried using AWS's built-in algorithm for XGBoost, which has some optimizations for the environment? For XGBoost, the docs say that, "[the built-in] implementation has a smaller memory footprint, better logging, improved hyperparameter validation, and an expanded set of metrics than the original versions."

Finally -- and this may be the solution in combination with using the built-in algorithm -- have you checked AWS's "EC2 Instance Recommendation for the XGBoost Algorithm"? Here's an excerpt from that (with my emphasis):

SageMaker XGBoost version 1.2 or later supports single-instance GPU training. Despite higher per-instance costs, GPUs train more quickly, making them more cost effective. SageMaker XGBoost version 1.2 or later supports P2 and P3 instances.

SageMaker XGBoost version 1.2-2 or later supports P2, P3, G4dn, and G5 GPU instance families.

To take advantage of GPU training, specify the instance type as one of the GPU instances (for example, P3) and set the tree_method hyperparameter to gpu_hist in your existing XGBoost script. SageMaker XGBoost currently does not support multi-GPU training.

Upvotes: 1

Raghu Ramesha
Raghu Ramesha

Reputation: 484

When you use an instance type with more CPU cores, generally that means you can increase MaxConcurrentTransforms, which controls the number of concurrent /invocations requests in-flight to the Model server at any given. The rule of thumb is to set MaxConcurrentTransformsequal to the number of cores, although requires some empirical testing to find out if your particular Model implementation can keep up with a faster request rate without breaking. Generally Model servers DO match the rule of thumb, setting number of webserver workers equal to the number of cores.

There may also be room to tune the BatchStrategy and MaxPayloadInMB for better throughput, e.g. passing larger multi-record payloads will allow the Model to complete the same amount of work with less total requests, thus reducing any overhead that may build up from frequent HTTP communication. Again it depends on how large of a request payload the Model server can handle, which may also depend on how much memory is needed and available on the given instance type.

Upvotes: 1

Related Questions