Scaling out a Asynchronous SageMaker Endpoint

Question

I've deployed a Asynchronous SageMaker Endpoint and I want it to scale out (to 0 instances) when nothing is requested for a period of times and to scale in when something is requested (to <=1 instances)

I've followed some post online and I created the scaling policy like this:

            self.scaling_policies = self.client_autoscaling.put_scaling_policy(
                PolicyName=self.policy_name,
                ServiceNamespace="sagemaker",
                ResourceId=self.resource_id,
                ScalableDimension="sagemaker:variant:DesiredInstanceCount",
                PolicyType="StepScaling",
                StepScalingPolicyConfiguration={
                    "AdjustmentType": "ChangeInCapacity",
                    "MetricAggregationType": "Average",
                    "Cooldown": 60,
                    "StepAdjustments":
                    [ 
                        {
                        "MetricIntervalLowerBound": 0,
                        "ScalingAdjustment": 1
                        }
                    ]
                },    
            )

            response = self.client_cloudwatch.put_metric_alarm(
                AlarmName=self.policy_name,
                MetricName='HasBacklogWithoutCapacity',
                Namespace='AWS/SageMaker',
                Statistic='Average',
                EvaluationPeriods= self.evaluation_periods,
                DatapointsToAlarm= self.datapoints,
                Threshold=self.target_value,
                ComparisonOperator='GreaterThanOrEqualToThreshold',
                TreatMissingData='breaching',
                Dimensions=[
                    { 'Name':'EndpointName', 'Value': self.endpoint_name},
                ],
                Period=self.period,
                AlarmActions=[self.scaling_policies['PolicyARN']],
            )

The endpoint succesfully creates and I see that there is a scaling policy attached to it. I've also see that an alarm is triggered but it doesn't scale out (it is deployed with an initial instance count of 1 and I expect to quickly go to 0)

Any help?

I tried to create a Asynchronous SageMaker Endpoint with a scaling policy to scale in an scale out following this posts:

https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-autoscale.html

How to Quickly Scale SageMaker Async Endpoint from 0 to 1 Instance for a Single Request?

https://medium.com/@neethu.v.gopal/asynchronous-endpoints-for-stable-diffusion-in-aws-using-sagemaker-with-autoscaling-b0db4206648b

Scaling out a Asynchronous SageMaker Endpoint

Answers (1)

Related Questions