Dolphin
Dolphin

Reputation: 39005

What may cause the Kubernetes Jenkins slave pod launching and suspended

I am using Kubernetes Jenkins to build the project, but sometimes when Jenkins starts a pod, it shows launching..... then suspended. and when I check the log output it shows 404.

HTTP ERROR 404 Not Found
URI:    /computer/default-j07v7/log
STATUS: 404
MESSAGE:    Not Found
SERVLET:    Stapler
Powered by Jetty:// 9.4.27.v20200227

This error looks like:

Image1

When the pod is suspended and goes to relaunching, again and again. The pod created events look normal:

Normal  Scheduled               default-scheduler   Successfully assigned infrastructure/default-v7m44 to k8sslave3
Normal  Pulled  1   2020-08-16T08:29:36Z    2020-08-16T08:29:36Z    kubelet Container image "jenkins/jnlp-slave:3.27-1" already present on machine
Normal  Created 1   2020-08-16T08:29:36Z    2020-08-16T08:29:36Z    kubelet Created container jnlp
Normal  Started 1   2020-08-16T08:29:36Z    2020-08-16T08:29:36Z    kubelet Started container jnlp

What should I do to fix this problem? Trying for days and I find if I tweak any parameter of pod templdate, the agent change to suspended immediately. If keep it by default, the agent should startup normal. It is wired problem and make me confusing. This is my jenkins master deployment yaml:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: jenkins
  namespace: infrastructure
  selfLink: /apis/apps/v1/namespaces/infrastructure/deployments/jenkins
  uid: 3df24fd6-ffaf-4f17-8b02-a2904cabbf95
  resourceVersion: '1707498'
  generation: 38
  creationTimestamp: '2020-07-18T14:48:47Z'
  labels:
    app.kubernetes.io/component: jenkins-master
    app.kubernetes.io/instance: jenkins
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: jenkins
    helm.sh/chart: jenkins-2.4.1
  annotations:
    deployment.kubernetes.io/revision: '10'
    meta.helm.sh/release-name: jenkins
    meta.helm.sh/release-namespace: infrastructure
  managedFields:
    - manager: Go-http-client
      operation: Update
      apiVersion: apps/v1
      time: '2020-08-02T10:08:04Z'
      fieldsType: FieldsV1
      
    - manager: dashboard
      operation: Update
      apiVersion: apps/v1
      time: '2020-08-17T14:27:59Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:spec':
          'f:template':
            'f:spec':
              'f:containers':
                'k:{"name":"jenkins"}':
                  'f:volumeMounts':
                    'k:{"mountPath":"/usr/bin/docker"}':
                      .: {}
                      'f:mountPath': {}
                      'f:name': {}
                    'k:{"mountPath":"/var/run/docker.sock"}':
                      .: {}
                      'f:mountPath': {}
                      'f:name': {}
              'f:securityContext':
                'f:runAsUser': {}
              'f:volumes':
                'k:{"name":"docker"}':
                  .: {}
                  'f:hostPath':
                    .: {}
                    'f:path': {}
                    'f:type': {}
                  'f:name': {}
                'k:{"name":"dockersock"}':
                  .: {}
                  'f:hostPath':
                    .: {}
                    'f:path': {}
                    'f:type': {}
                  'f:name': {}
    - manager: kube-controller-manager
      operation: Update
      apiVersion: apps/v1
      time: '2020-08-18T16:14:00Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:deployment.kubernetes.io/revision': {}
        'f:status':
          'f:availableReplicas': {}
          'f:conditions':
            .: {}
            'k:{"type":"Available"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:lastUpdateTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
            'k:{"type":"Progressing"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:lastUpdateTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
          'f:observedGeneration': {}
          'f:readyReplicas': {}
          'f:replicas': {}
          'f:updatedReplicas': {}
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: jenkins-master
      app.kubernetes.io/instance: jenkins
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: jenkins-master
        app.kubernetes.io/instance: jenkins
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: jenkins
        helm.sh/chart: jenkins-2.4.1
      annotations:
        checksum/config: 60990c68bb90ec59c79d56498da29d250d8da13cfbb9c35cad53f0cd789f318b
    spec:
      volumes:
        - name: plugins
          emptyDir: {}
        - name: tmp
          emptyDir: {}
        - name: jenkins-config
          configMap:
            name: jenkins
            defaultMode: 420
        - name: secrets-dir
          emptyDir: {}
        - name: plugin-dir
          emptyDir: {}
        - name: jenkins-home
          persistentVolumeClaim:
            claimName: jenkins
        - name: sc-config-volume
          emptyDir: {}
        - name: dockersock
          hostPath:
            path: /var/run/docker.sock
            type: ''
        - name: docker
          hostPath:
            path: /usr/bin/docker
            type: ''
      initContainers:
        - name: copy-default-config
          image: 'jenkins/jenkins:lts'
          command:
            - sh
            - /var/jenkins_config/apply_config.sh
          env:
            - name: ADMIN_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: jenkins
                  key: jenkins-admin-password
            - name: ADMIN_USER
              valueFrom:
                secretKeyRef:
                  name: jenkins
                  key: jenkins-admin-user
          resources:
            limits:
              cpu: '2'
              memory: 4Gi
            requests:
              cpu: 50m
              memory: 256Mi
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: jenkins-home
              mountPath: /var/jenkins_home
            - name: jenkins-config
              mountPath: /var/jenkins_config
            - name: secrets-dir
              mountPath: /usr/share/jenkins/ref/secrets/
            - name: plugins
              mountPath: /usr/share/jenkins/ref/plugins
            - name: plugin-dir
              mountPath: /var/jenkins_plugins
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: Always
      containers:
        - name: jenkins
          image: 'jenkins/jenkins:lts'
          args:
            - '--argumentsRealm.passwd.$(ADMIN_USER)=$(ADMIN_PASSWORD)'
            - '--argumentsRealm.roles.$(ADMIN_USER)=admin'
            - '--httpPort=8080'
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
            - name: slavelistener
              containerPort: 50000
              protocol: TCP
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: JAVA_OPTS
              value: |

                -Dcasc.reload.token=$(POD_NAME) 
            - name: JENKINS_OPTS
            - name: JENKINS_SLAVE_AGENT_PORT
              value: '50000'
            - name: ADMIN_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: jenkins
                  key: jenkins-admin-password
            - name: ADMIN_USER
              valueFrom:
                secretKeyRef:
                  name: jenkins
                  key: jenkins-admin-user
            - name: CASC_JENKINS_CONFIG
              value: /var/jenkins_home/casc_configs
          resources:
            limits:
              cpu: '2'
              memory: 4Gi
            requests:
              cpu: 50m
              memory: 256Mi
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: jenkins-home
              mountPath: /var/jenkins_home
            - name: jenkins-config
              readOnly: true
              mountPath: /var/jenkins_config
            - name: secrets-dir
              mountPath: /usr/share/jenkins/ref/secrets/
            - name: plugin-dir
              mountPath: /usr/share/jenkins/ref/plugins/
            - name: sc-config-volume
              mountPath: /var/jenkins_home/casc_configs
            - name: dockersock
              mountPath: /var/run/docker.sock
            - name: docker
              mountPath: /usr/bin/docker
          livenessProbe:
            httpGet:
              path: /login
              port: http
              scheme: HTTP
            initialDelaySeconds: 90
            timeoutSeconds: 5
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 5
          readinessProbe:
            httpGet:
              path: /login
              port: http
              scheme: HTTP
            initialDelaySeconds: 60
            timeoutSeconds: 5
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: Always
        - name: jenkins-sc-config
          image: 'kiwigrid/k8s-sidecar:0.1.144'
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: LABEL
              value: jenkins-jenkins-config
            - name: FOLDER
              value: /var/jenkins_home/casc_configs
            - name: NAMESPACE
              value: infrastructure
            - name: REQ_URL
              value: >-
                http://localhost:8080/reload-configuration-as-code/?casc-reload-token=$(POD_NAME)
            - name: REQ_METHOD
              value: POST
            - name: REQ_RETRY_CONNECT
              value: '10'
          resources: {}
          volumeMounts:
            - name: sc-config-volume
              mountPath: /var/jenkins_home/casc_configs
            - name: jenkins-home
              mountPath: /var/jenkins_home
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      serviceAccountName: jenkins
      serviceAccount: jenkins
      securityContext:
        runAsUser: 0
        fsGroup: 976
      schedulerName: default-scheduler
  strategy:
    type: Recreate
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600
status:
  observedGeneration: 38
  replicas: 1
  updatedReplicas: 1
  readyReplicas: 1
  availableReplicas: 1
  conditions:
    - type: Progressing
      status: 'True'
      lastUpdateTime: '2020-08-17T14:45:20Z'
      lastTransitionTime: '2020-08-17T14:45:20Z'
      reason: NewReplicaSetAvailable
      message: ReplicaSet "jenkins-7454db64f6" has successfully progressed.
    - type: Available
      status: 'True'
      lastUpdateTime: '2020-08-18T16:14:00Z'
      lastTransitionTime: '2020-08-18T16:14:00Z'
      reason: MinimumReplicasAvailable
      message: Deployment has minimum availability.

this is part of log output in master pod:

2020-08-21 16:44:40.381+0000 [id=955]   WARNING i.f.k.c.d.i.WatchConnectionManager$1#onFailure: Exec Failure
java.util.concurrent.RejectedExecutionException: Task okhttp3.RealCall$AsyncCall@2fb3e877 rejected from java.util.concurrent.ThreadPoolExecutor@9ce8b47[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 18]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
    at okhttp3.RealCall$AsyncCall.executeOn(RealCall.java:183)
Caused: java.io.InterruptedIOException: executor rejected
    at okhttp3.RealCall$AsyncCall.executeOn(RealCall.java:186)
    at okhttp3.Dispatcher.promoteAndExecute(Dispatcher.java:186)
    at okhttp3.Dispatcher.enqueue(Dispatcher.java:137)
    at okhttp3.RealCall.enqueue(RealCall.java:127)
    at okhttp3.internal.ws.RealWebSocket.connect(RealWebSocket.java:193)
    at okhttp3.OkHttpClient.newWebSocket(OkHttpClient.java:435)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.runWatch(WatchConnectionManager.java:158)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$1200(WatchConnectionManager.java:50)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2$1.execute(WatchConnectionManager.java:321)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$NamedRunnable.run(WatchConnectionManager.java:410)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2020-08-21 16:44:45.239+0000 [id=33]    INFO    hudson.slaves.NodeProvisioner#lambda$update$6: default-3393d provisioning successfully completed. We have now 3 computer(s)
2020-08-21 16:44:45.241+0000 [id=2765]  INFO    o.c.j.p.k.KubernetesLauncher#launch: Created Pod: infrastructure/default-3393d
2020-08-21 16:44:45.302+0000 [id=2826]  INFO    o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
2020-08-21 16:44:45.350+0000 [id=2765]  INFO    o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
2020-08-21 16:44:55.363+0000 [id=2765]  WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: default-3393d, template=PodTemplate{inheritFrom='', name='default', namespace='', hostNetwork=false, activeDeadlineSeconds=10, label='jenkins-jenkins-slave ', serviceAccount='default', nodeSelector='', nodeUsageMode=NORMAL, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], containers=[ContainerTemplate{name='jnlp', image='jenkins/jnlp-slave:3.27-1', workingDir='/home/jenkins', command='/bin/sh -c', args='${computer.jnlpmac} ${computer.name}', resourceRequestCpu='512m', resourceRequestMemory='512Mi', resourceLimitCpu='512m', resourceLimitMemory='512Mi', envVars=[ContainerEnvVar [getValue()=http://jenkins.infrastructure.svc.cluster.local:8080, getKey()=JENKINS_URL]], livenessProbe=org.csanchez.jenkins.plugins.kubernetes.ContainerLivenessProbe@5187faf3}]}
java.lang.IllegalStateException: Pod has terminated containers: infrastructure/default-3393d (jnlp)
    at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:133)
    at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:154)
    at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.await(AllContainersRunningPodWatcher.java:94)
    at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:140)
    at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:296)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2020-08-21 16:44:55.363+0000 [id=2765]  INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent default-3393d
Terminated Kubernetes instance for agent infrastructure/default-3393d
Disconnected computer default-3393d
2020-08-21 16:44:55.383+0000 [id=2765]  INFO    o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent infrastructure/default-3393d
2020-08-21 16:44:55.383+0000 [id=2765]  INFO    o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer default-3393d
2020-08-21 16:45:05.198+0000 [id=42]    INFO    o.c.j.p.k.KubernetesCloud#provision: Excess workload after pending Kubernetes agents: 1
2020-08-21 16:45:05.198+0000 [id=42]    INFO    o.c.j.p.k.KubernetesCloud#provision: Template for label null: default
2020-08-21 16:45:12.383+0000 [id=955]   WARNING i.f.k.c.d.i.WatchConnectionManager$1#onFailure: Exec Failure
java.util.concurrent.RejectedExecutionException: Task okhttp3.RealCall$AsyncCall@6c6c7a45 rejected from java.util.concurrent.ThreadPoolExecutor@9ce8b47[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 18]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
    at okhttp3.RealCall$AsyncCall.executeOn(RealCall.java:183)
Caused: java.io.InterruptedIOException: executor rejected
    at okhttp3.RealCall$AsyncCall.executeOn(RealCall.java:186)
    at okhttp3.Dispatcher.promoteAndExecute(Dispatcher.java:186)
    at okhttp3.Dispatcher.enqueue(Dispatcher.java:137)
    at okhttp3.RealCall.enqueue(RealCall.java:127)
    at okhttp3.internal.ws.RealWebSocket.connect(RealWebSocket.java:193)
    at okhttp3.OkHttpClient.newWebSocket(OkHttpClient.java:435)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.runWatch(WatchConnectionManager.java:158)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$1200(WatchConnectionManager.java:50)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2$1.execute(WatchConnectionManager.java:321)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$NamedRunnable.run(WatchConnectionManager.java:410)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2020-08-21 16:45:15.236+0000 [id=2765]  INFO    o.c.j.p.k.KubernetesLauncher#launch: Created Pod: infrastructure/default-03q6x
2020-08-21 16:45:15.252+0000 [id=36]    INFO    hudson.slaves.NodeProvisioner#lambda$update$6: default-03q6x provisioning successfully completed. We have now 3 computer(s)
2020-08-21 16:45:15.314+0000 [id=2824]  INFO    o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
2020-08-21 16:45:15.381+0000 [id=2765]  INFO    o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
2020-08-21 16:45:25.390+0000 [id=2765]  WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: default-03q6x, template=PodTemplate{inheritFrom='', name='default', namespace='', hostNetwork=false, activeDeadlineSeconds=10, label='jenkins-jenkins-slave ', serviceAccount='default', nodeSelector='', nodeUsageMode=NORMAL, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], containers=[ContainerTemplate{name='jnlp', image='jenkins/jnlp-slave:3.27-1', workingDir='/home/jenkins', command='/bin/sh -c', args='${computer.jnlpmac} ${computer.name}', resourceRequestCpu='512m', resourceRequestMemory='512Mi', resourceLimitCpu='512m', resourceLimitMemory='512Mi', envVars=[ContainerEnvVar [getValue()=http://jenkins.infrastructure.svc.cluster.local:8080, getKey()=JENKINS_URL]], livenessProbe=org.csanchez.jenkins.plugins.kubernetes.ContainerLivenessProbe@5187faf3}]}
java.lang.IllegalStateException: Pod has terminated containers: infrastructure/default-03q6x (jnlp)
    at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:133)
    at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:154)
    at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.await(AllContainersRunningPodWatcher.java:94)
    at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:140)
    at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:296)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2020-08-21 16:45:25.391+0000 [id=2765]  INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent default-03q6x
Terminated Kubernetes instance for agent infrastructure/default-03q6x

and now this is my kubernetes cloud template snapshot:

enter image description here

this is the pod template config:

enter image description here

Upvotes: 4

Views: 6286

Answers (1)

Dashrath Mundkar
Dashrath Mundkar

Reputation: 9212

I would suggest few changes do it like this

  1. Keep everything blank for jenkins tunnel. Jenkins will automatically will pick it up.

  2. If you deployed this jenkins instance in kubernetes cluster then please use internal address for jenkins_url like http://jenkins.infrastructure.svc i assume your jenkins service name is jenkins and it is ClusterIP

Upvotes: 2

Related Questions