Reputation: 5029
I have deployed a linear regression model on Sagemaker. Now I want to write a lambda function to make prediction on input data. Files are pulled from S3 first. Some preprocessing is done and the final input is a pandas
dataframe. According to boto3 sagemaker documentation, the payload can either be byte-like, or file. So I have tried to convert the dataframe to a byte array using code from this post
# Convert pandas dataframe to byte array
pred_np = pred_df.to_records(index=False)
pred_str = pred_np.tostring()
# Start sagemaker prediction
sm_runtime = aws_session.client('runtime.sagemaker')
response = sm_runtime.invoke_endpoint(
EndpointName=SAGEMAKER_ENDPOINT,
Body=pred_str,
ContentType='text/csv',
Accept='Accept')
I printed out pred_str
which does seem like a byte array to me.
However when I run it, I got the following Algorithm Error
caused by UnicodeDecodeError
:
Caused by: 'utf8' codec can't decode byte 0xed in position 9: invalid continuation byte
The traceback shows python 2.7 not sure why that is:
Traceback (most recent call last):
File "/opt/amazon/lib/python2.7/site-packages/ai_algorithms_sdk/serve.py", line 465, in invocations
data_iter = get_data_iterator(payload, **content_parameters)
File "/opt/amazon/lib/python2.7/site-packages/ai_algorithms_sdk/io/serve_helpers.py", line 99, in iterator_csv_dense_rank_2
payload = payload.decode("utf8")
File "/opt/amazon/python2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
Is the default decoder utf_8
? What is the right decoder I should be using? Why is it complaining about position 9?
In addition, I also tried to save the dataframe to csv file and use that as payload
pred_df.to_csv('pred.csv', index=False)
with open('pred.csv', 'rb') as f:
payload = f.read()
response = sm_runtime.invoke_endpoint(
EndpointName=SAGEMAKER_ENDPOINT,
Body=payload,
ContentType='text/csv',
Accept='Accept')
However when I ran it I got the following error:
Customer Error: Unable to parse payload. Some rows may have more columns than others and/or non-numeric values may be present in the csv data.
And again, the traceback is calling python 2.7:
Traceback (most recent call last):
File "/opt/amazon/lib/python2.7/site-packages/ai_algorithms_sdk/serve.py", line 465, in invocations
data_iter = get_data_iterator(payload, **content_parameters)
File "/opt/amazon/lib/python2.7/site-packages/ai_algorithms_sdk/io/serve_helpers.py", line 123, in iterator_csv_dense_rank_2
It doesn't make sense at all because it is standard 6x78 dataframe. All rows have same number of columns. Plus none of the columns are non-numeric.
How to fix this sagemaker issue?
Upvotes: 0
Views: 1929
Reputation: 5029
I was finally able to make it work with the following code:
payload = io.StringIO()
pred_df.to_csv(payload, header=None, index=None)
sm_runtime = aws_session.client('runtime.sagemaker')
response = sm_runtime.invoke_endpoint(
EndpointName=SAGEMAKER_ENDPOINT,
Body=payload.getvalue(),
ContentType='text/csv',
Accept='Accept')
It is very import to call getvalue()
function for the payload while invoking the endpoint. Hope this helps
Upvotes: 4