Francesco Camussoni
Francesco Camussoni

Reputation: 1

Scaling out a Asynchronous SageMaker Endpoint

I've deployed a Asynchronous SageMaker Endpoint and I want it to scale out (to 0 instances) when nothing is requested for a period of times and to scale in when something is requested (to <=1 instances)

I've followed some post online and I created the scaling policy like this:

            self.scaling_policies = self.client_autoscaling.put_scaling_policy(
                PolicyName=self.policy_name,
                ServiceNamespace="sagemaker",
                ResourceId=self.resource_id,
                ScalableDimension="sagemaker:variant:DesiredInstanceCount",
                PolicyType="StepScaling",
                StepScalingPolicyConfiguration={
                    "AdjustmentType": "ChangeInCapacity",
                    "MetricAggregationType": "Average",
                    "Cooldown": 60,
                    "StepAdjustments":
                    [ 
                        {
                        "MetricIntervalLowerBound": 0,
                        "ScalingAdjustment": 1
                        }
                    ]
                },    
            )
            response = self.client_cloudwatch.put_metric_alarm(
                AlarmName=self.policy_name,
                MetricName='HasBacklogWithoutCapacity',
                Namespace='AWS/SageMaker',
                Statistic='Average',
                EvaluationPeriods= self.evaluation_periods,
                DatapointsToAlarm= self.datapoints,
                Threshold=self.target_value,
                ComparisonOperator='GreaterThanOrEqualToThreshold',
                TreatMissingData='breaching',
                Dimensions=[
                    { 'Name':'EndpointName', 'Value': self.endpoint_name},
                ],
                Period=self.period,
                AlarmActions=[self.scaling_policies['PolicyARN']],
            )

The endpoint succesfully creates and I see that there is a scaling policy attached to it. I've also see that an alarm is triggered but it doesn't scale out (it is deployed with an initial instance count of 1 and I expect to quickly go to 0)

Any help?

I tried to create a Asynchronous SageMaker Endpoint with a scaling policy to scale in an scale out following this posts:

https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-autoscale.html

How to Quickly Scale SageMaker Async Endpoint from 0 to 1 Instance for a Single Request?

https://medium.com/@neethu.v.gopal/asynchronous-endpoints-for-stable-diffusion-in-aws-using-sagemaker-with-autoscaling-b0db4206648b

Upvotes: 0

Views: 97

Answers (1)

Eyal Solomon
Eyal Solomon

Reputation: 616

Has a similar Issue, In my case It was insufficient quota

How I debugged It

  1. CloudWatch "scale up" alarm did trigger correctly but no scaling on sagemaker endpoint side
2024-12-19 08:46:44
Action
Successfully executed action arn:aws:autoscaling:x:x:scalingPolicy:x:resource/sagemaker/endpoint/x/variant/x:policyName/x
  1. Tried describing auto scale operations at the endpoint level

aws application-autoscaling describe-scaling-activities --service-namespace sagemaker --resource-id endpoint/x/variant/x

Issue was not enough quota for specific Instance type

"StatusMessage": "Failed to set desired instance count to 3. Reason: The account-level service limit 'ml.p2.xlarge for endpoint usage' is 2 Instances, with curre
nt utilization of 0 Instances and a request delta of 3 Instances. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available
, contact AWS support to request an increase for this quota. (Service: AmazonSageMaker; Status Code: 400; Error Code: ResourceLimitExceeded; Request ID: x; Proxy: null)."

Upvotes: 0

Related Questions