user655321
user655321

Reputation: 1732

`kubectl delete service` gets stuck in 'Terminating' state

I'm trying to delete a service I wrote & deployed to Azure Kubernetes Service (along with required Dask components that accompany it), and when I run kubectl delete -f my_manifest.yml, my service gets stuck in the Terminating state. The console tells me that it was deleted, but the command hangs:

> kubectl delete -f my-manifest.yaml
service "dask-scheduler" deleted
deployment.apps "dask-scheduler" deleted
deployment.apps "dask-worker" deleted
service "my-service" deleted
deployment.apps "my-deployment" deleted

I have to Ctrl+C this command. When I check my services, Dask has been successfully deleted, but my custom service hasn't. If I try to manually delete it, it similarly hangs/fails:

> kubectl get services
NAME                TYPE           CLUSTER-IP   EXTERNAL-IP   PORT(S)                      AGE
kubernetes          ClusterIP      x.x.x.x      <none>        443/TCP                      18h
my-service          LoadBalancer   x.x.x.x      x.x.x.x       80:30786/TCP,443:31934/TCP   18h

> kubectl delete service my-service
service "my-service" deleted

This question says to delete the pods first, but all my pods are deleted (kubectl get pods returns nothing). There's also this closed K8s issue that says --wait=false might fix foreground cascade deletion, but this doesn't work and doesn't seem to be the issue here anyway (as the pods themselves have already been deleted).

I assume that I can completely wipe out my AKS cluster and re-create, but that's an option of last resort here. I don't know whether it's relevant, but my service is using the azure-load-balancer-internal: "true" annotation for the service, and I have a webapp deployed to my VNet that uses this service.

Is there any other way to force shutdown this service?

Upvotes: 8

Views: 16943

Answers (3)

MXWest
MXWest

Reputation: 401

For Windows, a follow-up to Cr4zyTun4's answer, in which finalizers are patched to null to allow deletion to complete.

I needed a slightly different syntax on the Windows CLI:

kubectl patch service svc-name -n namespace  -p "{\"metadata\":{\"finalizers\":null}}"

When I used the version enclosed with ' symbol, Windows threw an error:

Error from server (BadRequest): json: cannot unmarshal string into Go value of type map[string]interface {}

Upvotes: 4

Cr4zyTun4
Cr4zyTun4

Reputation: 765

I had a similar issue with a svc not connecting to the pod cause the pod was already deleted:

HTTPConnectionPool(host='scv-name-not-shown-because-prod.namespace-prod', port=7999): Max retries exceeded with url: 
my-url-not-shown-because-prod (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 
0x7faee4b112b0>: Failed to establish a new connection: [Errno 110] Connection timed out'))

I was able to solve this with the patch command:

kubectl patch service scv-name-not-shown-because-prod -n namespace-prod -p '{"metadata":{"finalizers":null}}'

I think the service went into some illegal state and was not able to ricover

Upvotes: 12

user655321
user655321

Reputation: 1732

Thanks to @4c74356b41's suggestion of looking at kubectl describe service my-service (which I hadn't considered for some reason), I saw this warning:

Code="LinkedAuthorizationFailed" Message="The client 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' has permission to perform action 'Microsoft.Network/loadBalancers/write' on scope '/subscriptions/<subscriptionId>/resourceGroups/<resourceGroup>/providers/Microsoft.Network/loadBalancers/kubernetes-internal'; however, it does not have permission to perform action 'Microsoft.Network/virtualNetworks/subnets/join/action' on the linked scope(s) '/subscriptions/<subscriptionId>/resourceGroups/<resourceGroup>/providers/Microsoft.Network/virtualNetworks/<vnet>/subnets/<subnet>' or the linked scope(s) are invalid.

(The client and object id GUIDs are the same value.)

This indicated that it's not exactly a Kubernetes issue, but moreso permissions within the Azure ecosystem. I looked through the portal and didn't find that GUID in any of my users, groups, or apps, so I'm not sure what it's referring to. However, I granted the Owner role to this client id, and after a few minutes, the service deleted.

az role assignment create `
    --role Owner `
    --assignee xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Upvotes: 4

Related Questions