Jason M
Jason M

Reputation: 1053

in a google cloud Kubernetes cluster my pods sometimes all restart, how do I find the reason for the restart?

From time to time all my pods restart and I'm not sure how to figure out why it's happening. Is there someplace in google cloud where I can get that information? or a kubectl command to run? It happens every couple of months or so. maybe less frequently than that.

Upvotes: 2

Views: 3140

Answers (2)

Reid123
Reid123

Reputation: 274

It's also a good thing to check your cluster and node-pool operations.

  1. Check the cluster operation in cloud shell and run the command:
gcloud container operations list
  1. Check the age of the nodes with the command
kubectl get nodes
  1. Check and analyze your deployment on how it reacts to operations such as cluster upgrade, node-pool upgrade & node-pool auto-repair. You can check the cloud logging if your cluster upgrade or node-pool upgrades using queries below:

Please note you have to add your cluster and node-pool name in the queries.

Control plane (master) upgraded:

resource.type="gke_cluster"
log_id("cloudaudit.googleapis.com/activity")
protoPayload.methodName:("UpdateCluster" OR "UpdateClusterInternal")
(protoPayload.metadata.operationType="UPGRADE_MASTER"
  OR protoPayload.response.operationType="UPGRADE_MASTER")
resource.labels.cluster_name=""

Node-pool upgraded

resource.type="gke_nodepool"
log_id("cloudaudit.googleapis.com/activity")
protoPayload.methodName:("UpdateNodePool" OR "UpdateClusterInternal")
protoPayload.metadata.operationType="UPGRADE_NODES"
resource.labels.cluster_name=""
resource.labels.nodepool_name=""

Upvotes: 3

Sai Chandra Gadde
Sai Chandra Gadde

Reputation: 3301

Using below methods for checking the reason for pod restart:

Use kubectl describe deployment <deployment_name> and kubectl describe pod <pod_name> which contains the information.

# Events:
#   Type     Reason   Age                 From               Message
#   ----     ------   ----                ----               -------
#   Warning  BackOff  40m                 kubelet, gke-xx    Back-off restarting failed container
# ..

You can see that the pod is restarted due to image pull backoff. We need to troubleshoot on that particular issue.

Check for logs using : kubectl logs <pod_name>

To get previous logs of your container (the restarted one), you may use --previous key on pod, like this:

kubectl logs your_pod_name --previous

You can also write a final message to /dev/termination-log, and this will show up as described in docs.

Attaching a troubleshooting doc for reference.

Upvotes: 2

Related Questions