How to make prediction with sagemaker on pandas dataframe

Question

I am using Sagemaker to train and deploy my machine learning model. As regard to prediction, it will be executed by a lambda function as a scheduled job (every hour). The process is as follows:

pull new data from S3 since last prediction
preprocess, aggregate and create prediction data set
call sagemaker endpoint and make prediction
either save result to s3 or insert to database table

Based on my finding, typically the input will either from lambda payload

data = json.loads(json.dumps(event))
payload = data['data']
print(payload)

response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                   ContentType='text/csv',
                                   Body=payload)

or read from s3 file: my_bucket = resource.Bucket('pred_data') #subsitute this for your s3 bucket name.

obj = client.get_object(Bucket=my_bucket, Key='foo.csv')
lines= obj['Body'].read().decode('utf-8').splitlines()
reader = csv.reader(lines)
file = io.StringIO(lines)


response = runtime.invoke_endpoint(EndpointName=ENDPOINT,
                                   ContentType='*/*',
                                   Body = file.getvalue(),
                                   Body=payload)
output = response['Body'].read().decode('utf-8')

Since I will be pulling raw data from s3 and preprocess, a pandas dataframe will be generated. Is it possible to feed this directly as the input of invoke_endpoint? I could upload the aggregated dataset to another S3 bucket, but does it have to go through the decoding, csv.reader, StringIO and all that just like the example I found or is there an easy way to do it? Is the decode step really necessary to get the output?

How to make prediction with sagemaker on pandas dataframe

Answers (1)

Related Questions