aws Sagemaker autoscaling with instance metrics per instance

I am using aws Sagemaker endpoint for inference. Based upon amount of traffic, endpoint should scale up and down by adding more instance into the endpoint. I am trying to use instance metrics (CPUUtilization, MemoryUtilization or DiskUtilization) as metric for sagemaker endpoint autoscaling. These are the predefined metrics as defined here: https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipeline-logs-metrics.html

The problem is that the instance metrics for a given endpoint are sum of all the running instances within an endpoint. For example in the following endpoint runtime settings:

Current running instances are 5 then the the value of CPUUtilization can range from 0 to 500%. Based upon the number of instances running the maximum value will change hence autoscaling policy should be changed. Question is: Is there any way to find out Metric per instance i.e. CPUUtilizationPerInstance without explicitly calculating them or through custom metric? Autoscaling policy of scaling up and down by setting a threshold on per instance CPUUtilization seems the right way. Is there any other similar option on aws?

Upvotes: 2

Answers (3)

zhrist

Reputation: 1558

Yes, there is a way to find out "Metric per instance" and ack upon those.

This is done via Auto scaling policies. You have not used auto-scalling and I suggest to enable auto-scaling and start as low as possible with initial instance, like 1.

There is a aws documentation for the policies, so that is a nice start to understand the scaling based on metrics aws configure model autoscaling

Useful example with code for metrics

Upvotes: 0

trudolf

Reputation: 2111

This blog post describes how you would define a custom metric to track average cpu utilisation per instance.

tl;dr

    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 90.0,
        'CustomizedMetricSpecification':
        {
            'MetricName': 'CPUUtilization',
            'Namespace': '/aws/sagemaker/Endpoints',
            'Dimensions': [
                {'Name': 'EndpointName', 'Value': endpoint_name },
                {'Name': 'VariantName','Value': 'AllTraffic'}
            ],
            'Statistic': 'Average', # Possible - 'Statistic': 'Average'|'Minimum'|'Maximum'|'SampleCount'|'Sum'
            'Unit': 'Percent'
        },
        'ScaleInCooldown': 600,
        'ScaleOutCooldown': 300
    }

Upvotes: 3

fm1ch4

Reputation: 56

There is an InvocationsPerInstance metric that shows the average number of invocations per instance when you use the 'Sum' statistic.

https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html

This blog post details how you would go about load testing your endpoint to find a good target value for InvocationsPerInstance to use in autoscaling: https://aws.amazon.com/blogs/machine-learning/load-test-and-optimize-an-amazon-sagemaker-endpoint-using-automatic-scaling/

Upvotes: 2

aws Sagemaker autoscaling with instance metrics per instance

Answers (3)

Related Questions