Reputation: 3893
I've deployed a model using the NVIDIA Triton Inference Server on AWS SageMaker and am attempting to expose it through a REST API using AWS API Gateway. This would make it accessible to clients.
Initially, I wrote code for directly invoking AWS SageMaker using the specific MIME type application/vnd.sagemaker-triton.binary+json;json-header-size={NUMBER}
(as detailed in AWS Documentation). This MIME type in the Content-Type header, where {NUMBER}
represents the number of bytes to be read as JSON followed by binary data, works flawlessly.
Following the AWS blog instructions, I created an API and set it to proxy my responses to the SageMaker Runtime without modification. Additionally, I added application/vnd.sagemaker-triton.binary+json
to the Binary Media Types to ensure it's proxied in binary form without alteration.
However, when I test the AWS Gateway endpoint, I encounter an error: The error message (unexpected size for input 'np_tensor', expecting 4 additional bytes) suggests that the Triton server is not receiving the correct binary data size, possibly due to the way API Gateway is processing the request.
It appears that AWS Gateway is not preserving the Content-Type=application/vnd.sagemaker-triton.binary+json;json-header-size={NUMBER}
header. Omitting this header when directly accessing the SageMaker endpoint results in the same error.
Logs indicate that the header is initially present, but subsequent entries show only truncated output, which doesn't provide much help.
Here is the code snippet I used:
python client code
import boto3
import botocore.session
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
import numpy as np
import json
import requests
aws_region = 'us-east-1'
# API Gateway URL
url = ""
# SageMaker Endpoint URL (commented out since we're using API Gateway)
# url = ""
# Sample dummy input data for testing
input_data = np.array([[-0.0024108887]]).astype('float32')
# Define the request body for the Triton server
json_request = {
"inputs": [
{
"name": "np_tensor",
"shape": list(input_data.shape),
"datatype": "FP32",
"parameters": {"binary_data_size": input_data.nbytes},
},
],
"outputs": [
{"name": "transcription", "parameters": {"binary_data": True}},
],
}
# Convert the request to a JSON string and then to bytes
json_request_str = json.dumps(json_request)
request_body = json_request_str.encode() + input_data.tobytes()
header_length = len(json_request_str)
# # Not needed for AWS Gateway
# # AWS session and credentials setup
# session = boto3.Session()
# credentials = session.get_credentials()
# # AWS Request with SigV4 Authentication
# request = AWSRequest(method="POST", url=url, data=request_body)
# SigV4Auth(credentials, 'sagemaker', aws_region).add_auth(request)
# signed_headers = dict(request.headers)
# Prepare headers, including the custom Content-Type header
signed_headers = {}
signed_headers["Content-Type"] = "application/vnd.sagemaker-triton.binary+json;json-header-size={}".format(header_length)
# Send the request and print the response
response = requests.post(
url,
headers=signed_headers,
data=request_body
)
print(response.content.decode("utf8"))
My questions are:
Any insights or suggestions would be greatly appreciated.
Upvotes: 0
Views: 258
Reputation: 3893
I found the solution, thanks to this answer! Amazon API Gateway does not pass on headers to SageMaker unless explicitly told to (which is a bit counterintuitive because it's not explicitly mentioned). To pass the header on:
Open your API's "Method Execution"
Go to "Method Requests"
Under "HTTP Request Headers" you should add the name of the header you want to add, e.g. like this:
Go back to "Method execution"
Go to "Integration request"
Under "URL request headers parameters" you add the header to be mapped and the mapping, e.g like this:
Click "Deploy"
P.S.
Don't forget to also add an "Accept" header in both your request and the gateway settings if your return type doesn't match the input type defined in "Content-Type". Otherwise e.g for AWS MMS server you will get something like:
Content type is not supported by this framework.
Please implement input_fn to to deserialize the request data or an output_fn
to serialize the response. For more information, see the SageMaker Python SDK README.
Upvotes: 1