BayanA
BayanA

Reputation: 53

jenkins inbound-agent cant reach master within kubernetes

Last week I was experimenting with Jenkins setting up it via the helm chart with kubernetes ephemeral agents and I got it working. Then this weekend I did something wrong(not sure what) and agents are not able to come up. when triggering the sample hello world pipeline the agents will try to connect but they just keep bouncing in the cluster. So i uninstalled jenkins and set it up again and am still having the same issue.

Details:

The jenkins master logs show this again and again as the master tries to provision the agent.

Jan 04, 2021 4:35:34 AM WARNING org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
Error in provisioning; agent=KubernetesSlave name: default-4khvl, template=PodTemplate{id='3816c387-4b94-482d-bdc9-87901b3d402a', name='default', label='jenkins-jenkins-agent', serviceAccount='default', nodeUsageMode=NORMAL, podRetention='Never', containers=[ContainerTemplate{name='jnlp', image='*************/archive/jenkins/inbound-agent:4.6-1-alpine', workingDir='/home/jenkins', args='${computer.jnlpmac} ${computer.name}', resourceRequestCpu='2', resourceRequestMemory='4Gi', resourceLimitCpu='2', resourceLimitMemory='4Gi', envVars=[KeyValueEnvVar [getValue()=http://jenkins.jenkins.svc.cluster.local:8080/jenkins, getKey()=JENKINS_URL]]}]}
Also:   java.lang.Throwable: launched here
    at hudson.slaves.SlaveComputer._connect(SlaveComputer.java:283)
    at hudson.model.Computer.connect(Computer.java:435)
    at hudson.slaves.CloudRetentionStrategy.start(CloudRetentionStrategy.java:73)
    at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.start(OnceRetentionStrategy.java:83)
    at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.start(OnceRetentionStrategy.java:46)
    at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:162)
    at hudson.model.AbstractCIBase.access$000(AbstractCIBase.java:44)
    at hudson.model.AbstractCIBase$2.run(AbstractCIBase.java:224)
    at hudson.model.Queue._withLock(Queue.java:1398)
    at hudson.model.Queue.withLock(Queue.java:1275)
    at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:207)
    at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1634)
    at jenkins.model.Nodes$2.run(Nodes.java:139)
    at hudson.model.Queue._withLock(Queue.java:1398)
    at hudson.model.Queue.withLock(Queue.java:1275)
    at jenkins.model.Nodes.addNode(Nodes.java:135)
    at jenkins.model.Jenkins.addNode(Jenkins.java:2157)
    at hudson.slaves.NodeProvisioner.lambda$update$6(NodeProvisioner.java:256)
    at hudson.model.Queue._withLock(Queue.java:1398)
    at hudson.model.Queue.withLock(Queue.java:1275)
    at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:225)
    at hudson.slaves.NodeProvisioner.access$900(NodeProvisioner.java:64)
    at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:821)
    at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:91)
    at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
java.lang.IllegalStateException: Agent is not connected after 31 seconds, status: Failed
    at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:233)
    at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:294)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

the pod logs show this when i retain the pods after they error out

➜  kubernetes-jenkins git:(master) ✗ kubectl logs -n jenkins-agent pod/default-nd2k0
default-nd2k0: line 1: 18a950820798693f38009beef2323ecaf4acabcff0d0e5603bce62f8417d3e6c: not found

I have also tried to bring up a permanent agent after setting up the agent on the master and bringing up the pod but i've had no success there.

permanent agent yaml

---
apiVersion: "v1"
kind: "Pod"
metadata:
  annotations:
    app: "worker-agent"
  labels:
    worker: "worker-agent"
  name: "kube-1"
  namespace: "jenkins-agent"
spec:
  containers:
  - env:
    - name: "JENKINS_SECRET"
      value: "83a734ff2152633ed7f7ca0150b3fa28c2cbe370ca91c4f7ca513379613fb7bd"
    - name: "JENKINS_TUNNEL"
      value: "jenkins-agent.svc.cluster.local:50000"
    - name: "JENKINS_AGENT_NAME"
      value: "kube-1"
    - name: "JENKINS_AGENT_WORKDIR"
      value: "/home/jenkins/agent"
    - name: "JENKINS_URL"
      value: "http://jenkins.jenkins.svc.cluster.local:8080/jenkins"
    image: "jenkins/inbound-agent:4.6-1-alpine"
    imagePullPolicy: "Always"
    name: "jnlp"
    resources:
      limits:
        cpu: "2000m"
        memory: "2048Mi"
      requests:
        cpu: "500m"
        memory: "1024Mi"
    volumeMounts:
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  nodeSelector:
    kubernetes.io/os: "linux"
  restartPolicy: "Never"
  volumes:
  - emptyDir:
      medium: ""
    name: "workspace-volume"

logs of the permanent agent

➜  kubernetes-jenkins git:(master) ✗ kubectl logs -n jenkins-agent pod/kube-1
Jan 04, 2021 4:48:54 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: kube-1
Jan 04, 2021 4:48:54 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Jan 04, 2021 4:48:54 AM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.6
Jan 04, 2021 4:48:54 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Jan 04, 2021 4:48:54 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Jan 04, 2021 4:48:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.jenkins.svc.cluster.local:8080/jenkins]
Jan 04, 2021 4:49:25 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/: connect timed out
java.io.IOException: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/: connect timed out
    at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
    at hudson.remoting.Engine.innerRun(Engine.java:689)
    at hudson.remoting.Engine.run(Engine.java:514)
Caused by: java.net.SocketTimeoutException: connect timed out
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:607)
    at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
    at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
    at sun.net.www.http.HttpClient.New(HttpClient.java:339)
    at sun.net.www.http.HttpClient.New(HttpClient.java:357)
    at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
    at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
    at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:211)
    ... 2 more

all services and pods are up and dns seems to work

➜  kubernetes-jenkins git:(master) ✗ kubectl exec -ti -n jenkins-agent dnsutils -- nslookup jenkins.jenkins.svc.cluster.local
Server:     100.100.64.10
Address:    100.100.64.10#53

Name:   jenkins.jenkins.svc.cluster.local
Address: 100.100.106.175

➜  kubernetes-jenkins git:(master) ✗ kubectl exec -ti -n jenkins-agent dnsutils -- nslookup jenkins-agent.jenkins.svc.cluster.local
Server:     100.100.64.10
Address:    100.100.64.10#53

Name:   jenkins-agent.jenkins.svc.cluster.local
Address: 100.100.77.168

➜  kubernetes-jenkins git:(master) ✗ kubectl get all -n jenkins
NAME            READY   STATUS    RESTARTS   AGE
pod/jenkins-0   2/2     Running   0          84m

NAME                    TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)     AGE
service/jenkins         ClusterIP   100.100.106.175   <none>        8080/TCP    84m
service/jenkins-agent   ClusterIP   100.100.77.168    <none>        50000/TCP   84m

NAME                       READY   AGE
statefulset.apps/jenkins   1/1     84m

any ideas anyone has i should checkout would be greatly appreciated.

I will try to dial my jenkins helm chart to the bare minimum to get it working again and keep this posting up to date with my trials and errors.

Upvotes: 0

Views: 1421

Answers (1)

BayanA
BayanA

Reputation: 53

I ran a util pod and saw i could curl other pod services on the cluster just not jenkins. And then i brought up jenkins on a pet cluster and saw the curl command worked there.

so after restarting each node in my cluster the curl command worked...not sure what the issue was I wish I did. I was then able to launch the agents successfully.

pod command and output:

➜  kubernetes-jenkins git:(master) ✗ kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -n jenkins -- /bin/bash
If you don't see a command prompt, try pressing enter.
bash-5.0# curl http://jenkins.jenkins:8080/jenkins/tcpSlaveAgentListener/


  Jenkins

Upvotes: 0

Related Questions