Reputation:
I have an auto-scaling instance group on GCP with 1 to 15 instances. The scale-out rule is CPU load more than 50%. No scale-in controls enabled. But I have all 15 instances running constantly. Monitoring shows that for the past 12 hours 5 instances had more than 80% CPU load (up to 300% (what?)) and 10 instances less than 5% CPU load. Why these 10 instances are running if they are not in use? I expect a maximum of a few spare instances while others are fully loaded. But not 2x more spare instances. Why it works so? How to make it work as expected?
The group auto-scaling options are as follows:
autoscaler:
autoscalingPolicy:
coolDownPeriodSec: 180
cpuUtilization:
utilizationTarget: 0.5
maxNumReplicas: 15
minNumReplicas: 1
mode: ON
scaleInControl:
maxScaledInReplicas:
fixed: 1
timeWindowSec: 60
creationTimestamp: '2020-12-04T01:46:57.815-08:00'
id: '***'
kind: compute#autoscaler
name: ***
recommendedSize: 10
selfLink: https://www.googleapis.com/compute/v1/projects/***/zones/europe-west4-a/autoscalers/***
status: ACTIVE
target: https://www.googleapis.com/compute/v1/projects/***/zones/europe-west4-a/instanceGroupManagers/***
zone: https://www.googleapis.com/compute/v1/projects/***/zones/europe-west4-a
Upvotes: 0
Views: 1213
Reputation: 36
It's important how do you distribute work between your instances.
Autoscaler provisions enough instances to meet the target average utilization of your instances. If your group has some instances highly utilized and some idle, it might by issue with the logic that distributes work between instances. Horizontal scaling work well assuming newly created instances are able to take part of work from the highly utilized instances.
Some things to check:
Upvotes: 1
Reputation: 1975
The feature called Predictive Auto Scaling PAS which may be the reason about the MIG scaling up with the CPU metric, still I'm not sure that your cluster has this enabled or not. PAS learn that daily (or at a specific day of the week) at this time there is a peak of load and it reacted accordingly (even if today is unusual and the peak didn't happen). PAS decide to scale up by a predictive algorithm.
If you run the following command for your MIG "your-mig" in order to verify and certify that the MIG will be scaling from the raw metric
gcloud alpha compute instance-groups managed update-autoscaling GROUP \
--cpu-utilization-predictive-method none
Upvotes: 0