lucidyan
lucidyan

Reputation: 3893

How to configure AWS API Gateway for NVIDIA Triton's Binary Data Protocol with AWS SageMaker?

I've deployed a model using the NVIDIA Triton Inference Server on AWS SageMaker and am attempting to expose it through a REST API using AWS API Gateway. This would make it accessible to clients.

Initially, I wrote code for directly invoking AWS SageMaker using the specific MIME type application/vnd.sagemaker-triton.binary+json;json-header-size={NUMBER} (as detailed in AWS Documentation). This MIME type in the Content-Type header, where {NUMBER} represents the number of bytes to be read as JSON followed by binary data, works flawlessly.

Following the AWS blog instructions, I created an API and set it to proxy my responses to the SageMaker Runtime without modification. Additionally, I added application/vnd.sagemaker-triton.binary+json to the Binary Media Types to ensure it's proxied in binary form without alteration.

However, when I test the AWS Gateway endpoint, I encounter an error: The error message (unexpected size for input 'np_tensor', expecting 4 additional bytes) suggests that the Triton server is not receiving the correct binary data size, possibly due to the way API Gateway is processing the request.

It appears that AWS Gateway is not preserving the Content-Type=application/vnd.sagemaker-triton.binary+json;json-header-size={NUMBER} header. Omitting this header when directly accessing the SageMaker endpoint results in the same error.

Logs indicate that the header is initially present, but subsequent entries show only truncated output, which doesn't provide much help.

Here is the code snippet I used:

python client code

import boto3
import botocore.session
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
import numpy as np
import json
import requests

aws_region = 'us-east-1'

# API Gateway URL
url = ""

# SageMaker Endpoint URL (commented out since we're using API Gateway)
# url = ""

# Sample dummy input data for testing
input_data = np.array([[-0.0024108887]]).astype('float32')

# Define the request body for the Triton server
json_request = {
    "inputs": [
        {
            "name": "np_tensor",
            "shape": list(input_data.shape),
            "datatype": "FP32",
            "parameters": {"binary_data_size": input_data.nbytes},
        },
    ],
    "outputs": [
        {"name": "transcription", "parameters": {"binary_data": True}},
    ],
}

# Convert the request to a JSON string and then to bytes
json_request_str = json.dumps(json_request)
request_body = json_request_str.encode() + input_data.tobytes()
header_length = len(json_request_str)

# # Not needed for AWS Gateway
# # AWS session and credentials setup
# session = boto3.Session()
# credentials = session.get_credentials()

# # AWS Request with SigV4 Authentication
# request = AWSRequest(method="POST", url=url, data=request_body)
# SigV4Auth(credentials, 'sagemaker', aws_region).add_auth(request)

# signed_headers = dict(request.headers)

# Prepare headers, including the custom Content-Type header
signed_headers = {}
signed_headers["Content-Type"] = "application/vnd.sagemaker-triton.binary+json;json-header-size={}".format(header_length)

# Send the request and print the response
response = requests.post(
    url, 
    headers=signed_headers,
    data=request_body
)

print(response.content.decode("utf8"))

My questions are:

  1. How can I ensure that AWS Gateway preserves the custom Content-Type header when proxying requests to SageMaker?
  2. Are there any additional configurations or settings in AWS Gateway that I might be missing to handle this type of request?
  3. Has anyone successfully configured a similar setup with AWS Gateway and SageMaker using Triton's binary data extension?

Any insights or suggestions would be greatly appreciated.

Upvotes: 0

Views: 258

Answers (1)

lucidyan
lucidyan

Reputation: 3893

I found the solution, thanks to this answer! Amazon API Gateway does not pass on headers to SageMaker unless explicitly told to (which is a bit counterintuitive because it's not explicitly mentioned). To pass the header on:

  1. Open your API's "Method Execution"

  2. Go to "Method Requests"

  3. Under "HTTP Request Headers" you should add the name of the header you want to add, e.g. like this: enter image description here

  4. Go back to "Method execution"

  5. Go to "Integration request"

  6. Under "URL request headers parameters" you add the header to be mapped and the mapping, e.g like this: 

    Content-Type -> method.request.header.Content-Type enter image description here

  7. Click "Deploy"


P.S.

Don't forget to also add an "Accept" header in both your request and the gateway settings if your return type doesn't match the input type defined in "Content-Type". Otherwise e.g for AWS MMS server you will get something like:

Content type is not supported by this framework. 

Please implement input_fn to to deserialize the request data or an output_fn 

to serialize the response. For more information, see the SageMaker Python SDK README.

Upvotes: 1

Related Questions