Zach
Zach

Reputation: 757

Replication Controller Not Starting Pod

I have a replication controller that keeps starting a pod but it's never up. How do I get to the replication controller logs so I can debug this? $ kubectl describe rc:

Name:       jenkins-leader-restored
Namespace:  default
Image(s):   gcr.io/cloud-solutions-images/jenkins-gcp-leader:master-5ca73a6
Selector:   name=jenkins,role=leader
Labels:     name=jenkins,role=leader
Replicas:   0 current / 1 desired
Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed
No volumes.
Events:
  FirstSeen LastSeen    Count   From                SubobjectPath   Reason          Message
  ───────── ────────    ─────   ────                ─────────────   ──────          ───────
  15m       15m     1   {replication-controller }           SuccessfulCreate    Created pod: jenkins-leader-restored-xxr93
  12m       12m     1   {replication-controller }           SuccessfulCreate    Created pod: jenkins-leader-restored-1e44w
  11m       11m     1   {replication-controller }           SuccessfulCreate    Created pod: jenkins-leader-restored-y3llu
  8m        8m      1   {replication-controller }           SuccessfulCreate    Created pod: jenkins-leader-restored-wfd70
  8m        8m      1   {replication-controller }           SuccessfulCreate    Created pod: jenkins-leader-restored-8ji09
  5m        5m      1   {replication-controller }           SuccessfulCreate    Created pod: jenkins-leader-restored-p4wbc
  4m        4m      1   {replication-controller }           SuccessfulCreate    Created pod: jenkins-leader-restored-tvreo
  1m        1m      1   {replication-controller }           SuccessfulCreate    Created pod: jenkins-leader-restored-l6rpy
  56s       56s     1   {replication-controller }           SuccessfulCreate    Created pod: jenkins-leader-restored-4asg5

Using the Automated Image Builds with Jenkins, Packer, and Kubernetes repo, the 'Practice Restoring a Backup' section.

Upvotes: 1

Views: 9258

Answers (1)

Zach
Zach

Reputation: 757

Prashanth B. identified the root cause of my issue which was that there were two replication controllers using the same selectors, with different replica values running at the same time.

The log location for kubelets (which run the pod) on the Google Compute Instance is, /var/log/kubelet.log. Looking here would have helped point out that the pod was immediately being removed.

My troubleshooting could have gone like this:

  1. Identify that pod isn't running as intended: kubectl get pods

  2. Check the replication controller: kubectl describe rc

  3. Search logs for the pod that was created, as seen in the previous command: grep xxr93 /var/log/kubelet.log

    user@gke-stuff-d9adf8e28-node-13cl:~$ grep xxr93 /var/log/kubelet.log 
    I1203 16:59:09.337110    3366 kubelet.go:2005] SyncLoop (ADD): "jenkins-leader-restored-xxr93_default"
    I1203 16:59:09.345356    3366 kubelet.go:2008] SyncLoop (UPDATE): "jenkins-leader-restored-xxr93_default"
    I1203 16:59:09.345423    3366 kubelet.go:2011] SyncLoop (REMOVE): "jenkins-leader-restored-xxr93_default"
    I1203 16:59:09.345503    3366 kubelet.go:2101] Failed to delete pod "jenkins-leader-restored-xxr93_default", err: pod not found
    I1203 16:59:09.483104    3366 manager.go:1707] Need to restart pod infra container for "jenkins-leader-restored-xxr93_default" because it is not found
    I1203 16:59:13.695134    3366 kubelet.go:1823] Killing unwanted pod "jenkins-leader-restored-xxr93"
    E1203 17:00:47.026865    3366 manager.go:1920] Error running pod "jenkins-leader-restored-xxr93_default" container "jenkins": impossible: cannot find the mounted volumes for pod "jenkins-leader-restored-xxr93_default"
    

Upvotes: 2

Related Questions