ddd
ddd

Reputation: 5029

Why can't I invoke sagemaker endpoint with either bytes or file as payload

I have deployed a linear regression model on Sagemaker. Now I want to write a lambda function to make prediction on input data. Files are pulled from S3 first. Some preprocessing is done and the final input is a pandas dataframe. According to boto3 sagemaker documentation, the payload can either be byte-like, or file. So I have tried to convert the dataframe to a byte array using code from this post

# Convert pandas dataframe to byte array
pred_np = pred_df.to_records(index=False)
pred_str = pred_np.tostring()

# Start sagemaker prediction
sm_runtime = aws_session.client('runtime.sagemaker')
response = sm_runtime.invoke_endpoint(
    EndpointName=SAGEMAKER_ENDPOINT,
    Body=pred_str,
    ContentType='text/csv',
    Accept='Accept')

I printed out pred_str which does seem like a byte array to me. enter image description here

However when I run it, I got the following Algorithm Error caused by UnicodeDecodeError:

Caused by: 'utf8' codec can't decode byte 0xed in position 9: invalid continuation byte

The traceback shows python 2.7 not sure why that is:

Traceback (most recent call last):
  File "/opt/amazon/lib/python2.7/site-packages/ai_algorithms_sdk/serve.py", line 465, in invocations
    data_iter = get_data_iterator(payload, **content_parameters)
  File "/opt/amazon/lib/python2.7/site-packages/ai_algorithms_sdk/io/serve_helpers.py", line 99, in iterator_csv_dense_rank_2
    payload = payload.decode("utf8")
  File "/opt/amazon/python2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)

Is the default decoder utf_8? What is the right decoder I should be using? Why is it complaining about position 9?

In addition, I also tried to save the dataframe to csv file and use that as payload

pred_df.to_csv('pred.csv', index=False)
with open('pred.csv', 'rb') as f:
    payload = f.read()
response = sm_runtime.invoke_endpoint(
    EndpointName=SAGEMAKER_ENDPOINT,
    Body=payload,
    ContentType='text/csv',
    Accept='Accept')

However when I ran it I got the following error:

Customer Error: Unable to parse payload. Some rows may have more columns than others and/or non-numeric values may be present in the csv data.

And again, the traceback is calling python 2.7:

Traceback (most recent call last):
  File "/opt/amazon/lib/python2.7/site-packages/ai_algorithms_sdk/serve.py", line 465, in invocations
    data_iter = get_data_iterator(payload, **content_parameters)
  File "/opt/amazon/lib/python2.7/site-packages/ai_algorithms_sdk/io/serve_helpers.py", line 123, in iterator_csv_dense_rank_2

It doesn't make sense at all because it is standard 6x78 dataframe. All rows have same number of columns. Plus none of the columns are non-numeric. enter image description here How to fix this sagemaker issue?

Upvotes: 0

Views: 1929

Answers (1)

ddd
ddd

Reputation: 5029

I was finally able to make it work with the following code:

payload = io.StringIO()
pred_df.to_csv(payload, header=None, index=None)

sm_runtime = aws_session.client('runtime.sagemaker')
response = sm_runtime.invoke_endpoint(
    EndpointName=SAGEMAKER_ENDPOINT,
    Body=payload.getvalue(),
    ContentType='text/csv',
    Accept='Accept')

It is very import to call getvalue() function for the payload while invoking the endpoint. Hope this helps

Upvotes: 4

Related Questions