JRJR
JRJR

Reputation: 31

Use AWS Lambda to run Sagemaker Batch Transform job

I would like to place a csv file in an S3 bucket and get predictions from a Sagemaker model using batch transform job automatically. I would like to do that by using s3 event notification (upon csv upload) to trigger a Lambda function which would do a batch transform job. The lambda function I have written so far is this:


import boto3
sagemaker = boto3.client('sagemaker')

input_data_path = 's3://yeex/upload/examples.csv'.format(default_bucket, 's3://yeex/upload/', 'examples.csv')
output_data_path = 's3://nooz/download/'.format(default_bucket, 's3://nooz/download')

transform_job = sagemaker.transformer.Transformer(
    model_name = y_xgboost_21,
    instance_count = 1,
    instance_type = 'ml.m5.large',
    strategy = 'SingleRecord',
    assemble_with = 'Line',
    output_path = output_data_path,
    base_transform_job_name='y-test-batch',
    sagemaker_session=sagemaker.Session(),
    accept = 'text/csv')

transform_job.transform(data = input_data_path, 
                        content_type = 'text/csv', 
                        split_type = 'Line')

The error it returns is that object sagemaker does not have module transform What is the syntax I should use in Lambda function?

Upvotes: 2

Views: 2935

Answers (1)

dingus
dingus

Reputation: 1001

While Boto3 (boto3.client("sagemaker")) is the general-purpose AWS SDK for Python across different services, examples that you might see referencing classes like Estimator, Transformer, Predictor and etc are referring to the SageMaker Python SDK (import sagemaker).

In general I'd say (almost?) anything that can be done in one can also be done in the other as they use the same underlying service APIs - but the purpose of the SM Python SDK is to provide higher-level abstractions and useful utilities: For example transparently zipping and uploading a source_dir to S3 to deliver "script mode" training.

As far as I'm aware, the SageMaker Python SDK is still not pre-installed in AWS Lambda Python runtimes by default: But it is an open-source and pip-installable package.

So you have 2 choices here:

  1. Continue using boto3 and create your transform job via the low-level create_transform_job API
  2. Install sagemaker in your Python Lambda bundle (Tools like AWS SAM or CDK might make this process easier) and instead import sagemaker so you can use the Transformer and other high-level Python APIs.

Upvotes: 0

Related Questions