Ogal Finklestein
Ogal Finklestein

Reputation: 65

Using kubectl to restart helm pods

Still pretty new to kubectl. I have a Rancher test environment (deployed via Terraform) that I am learning things on. I received a timeout error while trying to deploy a new k8s cluster to my environment. I looked at the pods and found 4 helm pods, all with errors:

% kubectl get pods --all-namespaces
NAMESPACE                 NAME                                                   READY   STATUS      RESTARTS   AGE
cattle-logging            rancher-logging-fluentd-linux-6x8vr                    2/2     Running     0          20h
cattle-logging            rancher-logging-fluentd-linux-9llsf                    2/2     Running     0          20h
cattle-logging            rancher-logging-fluentd-linux-hhwtb                    2/2     Running     0          20h
cattle-logging            rancher-logging-fluentd-linux-rzbc8                    2/2     Running     0          20h
cattle-logging            rancher-logging-log-aggregator-linux-9q6w8             1/1     Running     0          20h
cattle-logging            rancher-logging-log-aggregator-linux-b27c4             1/1     Running     0          20h
cattle-logging            rancher-logging-log-aggregator-linux-h8q75             1/1     Running     0          20h
cattle-logging            rancher-logging-log-aggregator-linux-hhbk7             1/1     Running     0          20h
cattle-system             helm-operation-2ztsk                                   1/2     Error       0          41m
cattle-system             helm-operation-7jlwf                                   1/2     Error       0          12m
cattle-system             helm-operation-fv5hq                                   1/2     Error       0          55m
cattle-system             helm-operation-zbdnd                                   1/2     Error       0          27m
cattle-system             rancher-6f77f5cbb4-cs4sp                               2/2     Running     0          42m
cattle-system             rancher-6f77f5cbb4-gvkv7                               2/2     Running     0          42m
cattle-system             rancher-6f77f5cbb4-jflnb                               2/2     Running     0          42m
cert-manager              cert-manager-cainjector-596464bfbd-zj2wg               1/1     Running     0          6h39m
cert-manager              cert-manager-df467b89d-c5kdw                           1/1     Running     0          6h39m
cert-manager              cert-manager-df467b89d-kbvgm                           1/1     Running     0          6h39m
cert-manager              cert-manager-df467b89d-lndnp                           1/1     Running     0          6h40m
cert-manager              cert-manager-webhook-55f8bd4b8c-m58n2                  1/1     Running     0          6h39m
fleet-system              fleet-agent-6688b99df5-n26zf                           1/1     Running     0          6h40m
fleet-system              fleet-controller-6dc545d5db-f6f2t                      1/1     Running     0          6h40m
fleet-system              gitjob-84bd8cf9c4-4q95g                                1/1     Running     0          6h40m
ingress-nginx             nginx-nginx-ingress-controller-58689b79d9-44q95        1/1     Running     0          6h40m
ingress-nginx             nginx-nginx-ingress-controller-58689b79d9-blgpf        1/1     Running     0          6h39m
ingress-nginx             nginx-nginx-ingress-controller-58689b79d9-wkdg9        1/1     Running     0          6h40m
ingress-nginx             nginx-nginx-ingress-default-backend-65d7b58ccc-tbwlk   1/1     Running     0          6h39m
kube-system               coredns-799dffd9c4-nmplh                               1/1     Running     0          6h39m
kube-system               coredns-799dffd9c4-stjhl                               1/1     Running     0          6h40m
kube-system               coredns-autoscaler-7868844956-qr67l                    1/1     Running     0          6h41m
kube-system               kube-flannel-5wzd7                                     2/2     Running     0          20h
kube-system               kube-flannel-hm7tc                                     2/2     Running     0          20h
kube-system               kube-flannel-hptdm                                     2/2     Running     0          20h
kube-system               kube-flannel-jjbpq                                     2/2     Running     0          20h
kube-system               kube-flannel-pqfkh                                     2/2     Running     0          20h
kube-system               metrics-server-59c6fd6767-ngrzg                        1/1     Running     0          6h40m
kube-system               rke-coredns-addon-deploy-job-l7n2b                     0/1     Completed   0          20h
kube-system               rke-metrics-addon-deploy-job-bkpf2                     0/1     Completed   0          20h
kube-system               rke-network-plugin-deploy-job-vht9d                    0/1     Completed   0          20h
metallb-system            controller-7686dfc96b-fn7hw                            1/1     Running     0          6h39m
metallb-system            speaker-9l8fp                                          1/1     Running     0          20h
metallb-system            speaker-9mxp2                                          1/1     Running     0          20h
metallb-system            speaker-b2ltt                                          1/1     Running     0          20h
rancher-operator-system   rancher-operator-576f654978-5c4kb                      1/1     Running     0          6h39m

I would like to see if restarting the pods would set them straight, but I cannot figure out how to do so. Helm does not show up under kubectl get deployments --all-namespaces, so I cannot scale the pods or do a kubectl rollout restart.

How can I restart these pods?

Upvotes: 1

Views: 2004

Answers (2)

Wytrzymały Wiktor
Wytrzymały Wiktor

Reputation: 13858

As you already noticed, restarting the Pods might not be the way to go with your problem. The better solution would be to try to get a better idea of what exactly went wrong on focus on fixing that. In order to do so you can follow the below steps (in that order):

  1. Debugging Pods by executing kubectl describe pods ${POD_NAME} and checking the reason behind it's failure. Note that, once your pod has been scheduled, the methods described in Debug Running Pods are available for debugging. These methods are:
  • Examining pod logs: with kubectl logs ${POD_NAME} ${CONTAINER_NAME} or kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}

  • Debugging with container exec: by running commands inside a specific container with kubectl exec

  • Debugging with an ephemeral debug container: Ephemeral containers are useful for interactive troubleshooting when kubectl exec is insufficient because a container has crashed or a container image doesn't include debugging utilities, such as with distroless images. kubectl has an alpha command that can create ephemeral containers for debugging beginning with version v1.18.

  • Debugging via a shell on the node: If none of these approaches work, you can find the host machine that the pod is running on and SSH into that host

Those steps should be enough to get into the core of the problem and than focus on fixing it.

Upvotes: 1

paulina moreno
paulina moreno

Reputation: 56

You could try to see more information about a specific pod for troubleshooting with the command : kubectl describe pod

Upvotes: 1

Related Questions