Reputation: 11023
I am using manual scaling in an Azure AKS cluster, which can scale up to 60 nodes.
Scaling command worked fine:
az aks scale --resource-group RG-1 --name KS-3 --node-count 46
{- Finished ..
"agentPoolProfiles": [
{
"availabilityZones": null,
"count": 46,
...
and reported the count of 46 nodes.
The status also shows "Succeeded":
az aks show --name KS-3 --resource-group RG-1 -o table
Name Location ResourceGroup KubernetesVersion ProvisioningState Fqdn
---------- ---------- ------------------ ------------------- ------------------- -------------------------
KS-3 xxxxxxx RG-1 1.16.13 Succeeded xxx.azmk8s.io
However, when I look at kubectl get nodes
it shows only 44 nodes:
kubectl get nodes | grep -c 'aks'
44
with 7 nodes in "Ready,SchedulingDisabled" state (and the rest in Ready state):
kubectl get nodes | grep -c "Ready,SchedulingDisabled"
7
When I try to scale the cluster down to 45 nodes, it gives this error:
Deployment failed. Correlation ID: xxxx-xxx-xxx-x. Node 'aks-nodepool1-xxxx-vmss000yz8' failed to be drained with error: 'nodes "aks-nodepool1-xxxx-vmss000yz8" not found'
I am not sure what got the cluster into this inconsistent state and how to go about debugging this.
Upvotes: 1
Views: 737
Reputation: 11023
It happened because two of the nodes were in corrupted state. We had to delete these nodes from the VM scale set in the node resource group associated with our AKS cluster. Not sure why the nodes got into this state though.
Upvotes: 4