B_Miner
B_Miner

Reputation: 1820

SageMaker Limits on Sklearn Batch Transformer Payload

I am following the gist of this tutorial:

https://aws.amazon.com/blogs/machine-learning/preprocess-input-data-before-making-predictions-using-amazon-sagemaker-inference-pipelines-and-scikit-learn/

where I am using a custom sklearn transformer to pre-process data before passing to xgboost. When I get to this point:

transformer = sklearn_preprocessor.transformer(
    instance_count=1, 
    instance_type='ml.m4.xlarge',
    assemble_with = 'Line',
    accept = 'text/csv')

# Preprocess training input
transformer.transform('s3://{}/{}'.format(input_bucket, input_key), content_type='text/csv')
print('Waiting for transform job: ' + transformer.latest_transform_job.job_name)
transformer.wait()
preprocessed_train = transformer.output_path

The location of the training data is S3 and there are multiple files there. I get an error that the max payload has been exceeded and it appears that you can only set up to 100MB. Does this mean that Sagemaker can not transform larger data as input into another process?

Upvotes: 2

Views: 2134

Answers (1)

Alohahaha
Alohahaha

Reputation: 106

In SageMaker batch transform, maxPayloadInMB * maxConcurrentTransform cannot exceed 100MB. However, a payload is the data portion of a request sent to your model. In your case, since the input is CSV, you can set the split_type to 'Line' and each CSV line will be taken as a record.

If the batch_strategy is "MultiRecord" (the default value), each payload will have as many records / lines as possible.

If the batch_strategy is "SingleRecord", each payload will have a single CSV line and you need to ensure each line is never larger than the max_payload_size_in_MB.

In short, if the split_type is specified (not 'None'), the max_payload_size_in_MB is nothing related to the total size of your input file.

https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTransformJob.html#SageMaker-CreateTransformJob-request-MaxPayloadInMB

Hope this helps!

Upvotes: 1

Related Questions