Deploying tensorflow model on sagemaker async endpoint and including an inference.py script

Question

I am trying to deploy a tensorflow model to async endpoint on sagemaker.

I've previously deployed the same model to a real time endpoint using the following code:

from sagemaker.tensorflow.serving import Model

tensorflow_serving_model = Model(model_data=model_artifact,
                                 entry_point = 'inference.py',
                                 source_dir = 'code',
                                 role=role,
                                 framework_version='2.3',
                                 sagemaker_session=sagemaker_session)

predictor = tensorflow_serving_model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge')

Using the source_dir argument; I was able to include inference.py and requirements.txt files with my model...

What iam trying to do now: iam trying to deploy the same model to an async endpoint, followingdoc and this blog example... I used the following snippits:

from sagemaker.image_uris import retrieve

deploy_instance_type = 'ml.m5.xlarge'
tensorflow_inference_image_uri = retrieve('tensorflow',
                                       region,
                                       version='2.8',
                                       py_version='py3',
                                       instance_type = deploy_instance_type,
                                       accelerator_type=None,
                                       image_scope='inference')

container = tensorflow_inference_image_uri
model_name = 'sagemaker-{0}'.format(str(int(time.time())))

# Create model
create_model_response = sm_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    PrimaryContainer = {
        'Image': container,
        'ModelDataUrl': model_artifact,
        'Environment': {
            'TS_MAX_REQUEST_SIZE': '100000000', #default max request size is 6 Mb for torchserve, need to update it to support the 70 mb input payload
            'TS_MAX_RESPONSE_SIZE': '100000000',
            'TS_DEFAULT_RESPONSE_TIMEOUT': '1000'
        }
    },    
)

# Create endpoint config
endpoint_config_name = f"AsyncEndpointConfig-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}"
create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "variant1",
            "ModelName": model_name,
            "InstanceType": "ml.m5.xlarge",
            "InitialInstanceCount": 1
        }
    ],
    AsyncInferenceConfig={
        "OutputConfig": {
            "S3OutputPath": f"s3://{bucket}/{bucket_prefix}/output",
            #  Optionally specify Amazon SNS topics
            "NotificationConfig": {
              "SuccessTopic": success_topic,
              "ErrorTopic": error_topic,
            }
        },
        "ClientConfig": {
            "MaxConcurrentInvocationsPerInstance": 2
        }
    }
)

# Create endpoint
endpoint_name = f"sm-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}"
create_endpoint_response = sm_client.create_endpoint(EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name)

The problem I am having: I can not specify a source directory containing my inference.py and my requirements.txt when trying to deploy the model to an async endpoint.

I am sure i can't include the code/ directory in the .tar model file according to the docs here the only way is through the source_dir argument in the SDK Model class initialization.

my question: how can i use my code/ directory containing my inference.py with my tensorflow model on async endpoint?

Marc Karp · Accepted Answer

The reason you do not have the option of source_dir is due to the fact that you are now trying to deploy the model using boto3 instead of using the SageMaker Python SDK which you used initially.

You can deploy your model to an Async Endpoint using the SDK as you did previously. Only difference is you need an AsyncInferenceConfig.

You can use something like:

from sagemaker.tensorflow.serving import Model

tensorflow_serving_model = Model(model_data=model_artifact,
                                 entry_point = 'inference.py',
                                 source_dir = 'code',
                                 role=role,
                                 framework_version='2.3',
                                 sagemaker_session=sagemaker_session)

from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig

async_config = AsyncInferenceConfig(
    output_path=f"s3://{s3_bucket}/{bucket_prefix}/output",
    max_concurrent_invocations_per_instance=4,
    # Optionally specify Amazon SNS topics
    # notification_config = {
    # "SuccessTopic": "arn:aws:sns:::",
    # "ErrorTopic": "arn:aws:sns:::",
    # }
)

endpoint_name = resource_name.format("Endpoint", datetime.now().strftime("%Y-%m-%d-%H-%M-%S"))

async_predictor = tensorflow_serving_model.deploy(
    async_inference_config=async_config,
    instance_type="ml.m5.xlarge",
    initial_instance_count=1,
    endpoint_name=endpoint_name,
)

Deploying tensorflow model on sagemaker async endpoint and including an inference.py script

Answers (1)

Related Questions