Reputation: 131
A few days ago our AKS cluster suffered a "downtime" of the backend which was confirmed by a support engineer on the Azure team. The primary effect of this downtime seems to have affected our cluster's LoadBalancer specifically. I noticed the error for the first time when I went to create a new nodepool on our cluster which failed with an error message which states:
{
"status": "Failed",
"error": {
"code": "ResourceOperationFailure",
"message": "The resource operation completed with terminal provisioning state 'Failed'.",
"details": [
{
"code": "ReconcileStandardLoadBalancerError",
"message": "Reconcile standard load balancer failed. Details: outboundReconciler retry failed: Category: ClientError; SubCode: InvalidRequestFormat_DuplicateResourceName; Dependency: Microsoft.Network/LoadBalancers; OrginalError: Code=\"InvalidRequestFormat\" Message=\"Cannot parse the request.\" Details=[{\"code\":\"DuplicateResourceName\",\"message\":\"Resource /subscriptions//resourceGroups//providers/Microsoft.Network/loadBalancers/ has two child resources with the same name (REDACTED-PUBLIC-IP-RESOURCE-NAME).\"}]; AKSTeam: Networking."
}
]
}
}
We have been completely unsuccessful since this error occurred in creating a new node pool on this cluster.
As far as I can tell the resource it is referencing which is a public IP address is not duplicated, but I truly don't really understand the error response at all.
I've been in touch with the support team for AKS but they seem to be at a loss as well and are recommending just to update the existing node image versions, which I am 99% sure won't fix this. I'm pretty stuck with trying to fix this and don't fully understand what the actual issue is. Any help would be hugely appreciated even if it's just a similar experience with an error such as this one.
Thanks.
Upvotes: 0
Views: 3299
Reputation: 54
My reading of that error is that AKS doesn't recognize the public IP is there, so is trying to create it again. It fails, so when you look, there's only one.
I'd try the following, in order.
Reader
access over everything. This is based on the assumption that if AKS could see the existing resource, it wouldn't try to create it.Caveat: These are in increasing order of risk to your existing cluster, and may result in a change in the public IP address, complete ingress failure, or worse. I would (ok.. I wouldn't, but you should) discuss these with the support team before you attempt.
-Dave
Upvotes: 2