Reputation: 1
I've deployed a Asynchronous SageMaker Endpoint and I want it to scale out (to 0 instances) when nothing is requested for a period of times and to scale in when something is requested (to <=1 instances)
I've followed some post online and I created the scaling policy like this:
self.scaling_policies = self.client_autoscaling.put_scaling_policy(
PolicyName=self.policy_name,
ServiceNamespace="sagemaker",
ResourceId=self.resource_id,
ScalableDimension="sagemaker:variant:DesiredInstanceCount",
PolicyType="StepScaling",
StepScalingPolicyConfiguration={
"AdjustmentType": "ChangeInCapacity",
"MetricAggregationType": "Average",
"Cooldown": 60,
"StepAdjustments":
[
{
"MetricIntervalLowerBound": 0,
"ScalingAdjustment": 1
}
]
},
)
response = self.client_cloudwatch.put_metric_alarm(
AlarmName=self.policy_name,
MetricName='HasBacklogWithoutCapacity',
Namespace='AWS/SageMaker',
Statistic='Average',
EvaluationPeriods= self.evaluation_periods,
DatapointsToAlarm= self.datapoints,
Threshold=self.target_value,
ComparisonOperator='GreaterThanOrEqualToThreshold',
TreatMissingData='breaching',
Dimensions=[
{ 'Name':'EndpointName', 'Value': self.endpoint_name},
],
Period=self.period,
AlarmActions=[self.scaling_policies['PolicyARN']],
)
The endpoint succesfully creates and I see that there is a scaling policy attached to it. I've also see that an alarm is triggered but it doesn't scale out (it is deployed with an initial instance count of 1 and I expect to quickly go to 0)
Any help?
I tried to create a Asynchronous SageMaker Endpoint with a scaling policy to scale in an scale out following this posts:
https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-autoscale.html
How to Quickly Scale SageMaker Async Endpoint from 0 to 1 Instance for a Single Request?
Upvotes: 0
Views: 97
Reputation: 616
Has a similar Issue, In my case It was insufficient quota
How I debugged It
2024-12-19 08:46:44
Action
Successfully executed action arn:aws:autoscaling:x:x:scalingPolicy:x:resource/sagemaker/endpoint/x/variant/x:policyName/x
aws application-autoscaling describe-scaling-activities --service-namespace sagemaker --resource-id endpoint/x/variant/x
Issue was not enough quota for specific Instance type
"StatusMessage": "Failed to set desired instance count to 3. Reason: The account-level service limit 'ml.p2.xlarge for endpoint usage' is 2 Instances, with curre
nt utilization of 0 Instances and a request delta of 3 Instances. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available
, contact AWS support to request an increase for this quota. (Service: AmazonSageMaker; Status Code: 400; Error Code: ResourceLimitExceeded; Request ID: x; Proxy: null)."
Upvotes: 0