user1657939
user1657939

Reputation: 21

Optimize batch transform inference on sagemaker

With current batch transform inference I see a lot of bottlenecks,

  1. Each input file can only have close to 1000 records
  2. Currently it is processing 2000/min records on 1 instance of ml.g4dn.12xlarge
  3. GPU instance are not necessarily giving any advantage over cpu instance.

I wonder if this is the existing limitation of the currently available tensorflow serving container v2.8. If thats the case config should I play with to increase the performance

i tried changing max_concurrent_transforms but doesn't seem to really help

my current config

transformer = tensorflow_serving_model.transformer(
    instance_count=1,
    instance_type="ml.g4dn.12xlarge",
    max_concurrent_transforms=0,
    output_path=output_data_path,
)

transformer.transform(
    data=input_data_path,
    split_type='Line',
    content_type="text/csv",
    job_name = job_name + datetime.now().strftime("%m-%d-%Y-%H-%M-%S"),
)

Upvotes: 1

Views: 1778

Answers (1)

Gili Nachum
Gili Nachum

Reputation: 5578

Generally speaking, you should first have a performing model (steps 1+2 below) yielding a satisfactory TPS, before you move over to batch transform parallelization techniques to push your overall TPS higher with parallization nobs.
Steps:

  1. GPU enabling - Run manual test to see that your model can utilize GPU instances to begin with (this isn't related to batch transform).
  2. picking instance - Use SageMaker Inference recommender to find the the most cost/effective instance type to run inference on.
  3. Batch transform inputs - Sounds like you have multiple input files which is needed if you'll want to speed up the job by adding more instances.
  4. Batch Transform Job single instance noobs - If you are using the CreateTransformJob API, you can reduce the time it takes to complete batch transform jobs by using optimal values for parameters such as MaxPayloadInMB, MaxConcurrentTransforms, or BatchStrategy. The ideal value for MaxConcurrentTransforms is equal to the number of compute workers in the batch transform job. If you are using the SageMaker console, you can specify these optimal parameter values in the Additional configuration section of the Batch transform job configuration page. SageMaker automatically finds the optimal parameter settings for built-in algorithms. For custom algorithms, provide these values through an execution-parameters endpoint.
  5. Batch transform cluster size - Increase the instance_count to more than 1, using the cost/effective instance you found in (1)+(2).

Upvotes: 2

Related Questions