Reputation: 53
Last week I was experimenting with Jenkins setting up it via the helm chart with kubernetes ephemeral agents and I got it working. Then this weekend I did something wrong(not sure what) and agents are not able to come up. when triggering the sample hello world pipeline the agents will try to connect but they just keep bouncing in the cluster. So i uninstalled jenkins and set it up again and am still having the same issue.
Details:
The jenkins master logs show this again and again as the master tries to provision the agent.
Jan 04, 2021 4:35:34 AM WARNING org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
Error in provisioning; agent=KubernetesSlave name: default-4khvl, template=PodTemplate{id='3816c387-4b94-482d-bdc9-87901b3d402a', name='default', label='jenkins-jenkins-agent', serviceAccount='default', nodeUsageMode=NORMAL, podRetention='Never', containers=[ContainerTemplate{name='jnlp', image='*************/archive/jenkins/inbound-agent:4.6-1-alpine', workingDir='/home/jenkins', args='${computer.jnlpmac} ${computer.name}', resourceRequestCpu='2', resourceRequestMemory='4Gi', resourceLimitCpu='2', resourceLimitMemory='4Gi', envVars=[KeyValueEnvVar [getValue()=http://jenkins.jenkins.svc.cluster.local:8080/jenkins, getKey()=JENKINS_URL]]}]}
Also: java.lang.Throwable: launched here
at hudson.slaves.SlaveComputer._connect(SlaveComputer.java:283)
at hudson.model.Computer.connect(Computer.java:435)
at hudson.slaves.CloudRetentionStrategy.start(CloudRetentionStrategy.java:73)
at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.start(OnceRetentionStrategy.java:83)
at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.start(OnceRetentionStrategy.java:46)
at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:162)
at hudson.model.AbstractCIBase.access$000(AbstractCIBase.java:44)
at hudson.model.AbstractCIBase$2.run(AbstractCIBase.java:224)
at hudson.model.Queue._withLock(Queue.java:1398)
at hudson.model.Queue.withLock(Queue.java:1275)
at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:207)
at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1634)
at jenkins.model.Nodes$2.run(Nodes.java:139)
at hudson.model.Queue._withLock(Queue.java:1398)
at hudson.model.Queue.withLock(Queue.java:1275)
at jenkins.model.Nodes.addNode(Nodes.java:135)
at jenkins.model.Jenkins.addNode(Jenkins.java:2157)
at hudson.slaves.NodeProvisioner.lambda$update$6(NodeProvisioner.java:256)
at hudson.model.Queue._withLock(Queue.java:1398)
at hudson.model.Queue.withLock(Queue.java:1275)
at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:225)
at hudson.slaves.NodeProvisioner.access$900(NodeProvisioner.java:64)
at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:821)
at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:91)
at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
java.lang.IllegalStateException: Agent is not connected after 31 seconds, status: Failed
at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:233)
at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:294)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
the pod logs show this when i retain the pods after they error out
➜ kubernetes-jenkins git:(master) ✗ kubectl logs -n jenkins-agent pod/default-nd2k0
default-nd2k0: line 1: 18a950820798693f38009beef2323ecaf4acabcff0d0e5603bce62f8417d3e6c: not found
I have also tried to bring up a permanent agent after setting up the agent on the master and bringing up the pod but i've had no success there.
permanent agent yaml
---
apiVersion: "v1"
kind: "Pod"
metadata:
annotations:
app: "worker-agent"
labels:
worker: "worker-agent"
name: "kube-1"
namespace: "jenkins-agent"
spec:
containers:
- env:
- name: "JENKINS_SECRET"
value: "83a734ff2152633ed7f7ca0150b3fa28c2cbe370ca91c4f7ca513379613fb7bd"
- name: "JENKINS_TUNNEL"
value: "jenkins-agent.svc.cluster.local:50000"
- name: "JENKINS_AGENT_NAME"
value: "kube-1"
- name: "JENKINS_AGENT_WORKDIR"
value: "/home/jenkins/agent"
- name: "JENKINS_URL"
value: "http://jenkins.jenkins.svc.cluster.local:8080/jenkins"
image: "jenkins/inbound-agent:4.6-1-alpine"
imagePullPolicy: "Always"
name: "jnlp"
resources:
limits:
cpu: "2000m"
memory: "2048Mi"
requests:
cpu: "500m"
memory: "1024Mi"
volumeMounts:
- mountPath: "/home/jenkins/agent"
name: "workspace-volume"
readOnly: false
nodeSelector:
kubernetes.io/os: "linux"
restartPolicy: "Never"
volumes:
- emptyDir:
medium: ""
name: "workspace-volume"
logs of the permanent agent
➜ kubernetes-jenkins git:(master) ✗ kubectl logs -n jenkins-agent pod/kube-1
Jan 04, 2021 4:48:54 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: kube-1
Jan 04, 2021 4:48:54 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Jan 04, 2021 4:48:54 AM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.6
Jan 04, 2021 4:48:54 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Jan 04, 2021 4:48:54 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Jan 04, 2021 4:48:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.jenkins.svc.cluster.local:8080/jenkins]
Jan 04, 2021 4:49:25 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/: connect timed out
java.io.IOException: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/: connect timed out
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
at hudson.remoting.Engine.innerRun(Engine.java:689)
at hudson.remoting.Engine.run(Engine.java:514)
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:211)
... 2 more
all services and pods are up and dns seems to work
➜ kubernetes-jenkins git:(master) ✗ kubectl exec -ti -n jenkins-agent dnsutils -- nslookup jenkins.jenkins.svc.cluster.local
Server: 100.100.64.10
Address: 100.100.64.10#53
Name: jenkins.jenkins.svc.cluster.local
Address: 100.100.106.175
➜ kubernetes-jenkins git:(master) ✗ kubectl exec -ti -n jenkins-agent dnsutils -- nslookup jenkins-agent.jenkins.svc.cluster.local
Server: 100.100.64.10
Address: 100.100.64.10#53
Name: jenkins-agent.jenkins.svc.cluster.local
Address: 100.100.77.168
➜ kubernetes-jenkins git:(master) ✗ kubectl get all -n jenkins
NAME READY STATUS RESTARTS AGE
pod/jenkins-0 2/2 Running 0 84m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jenkins ClusterIP 100.100.106.175 <none> 8080/TCP 84m
service/jenkins-agent ClusterIP 100.100.77.168 <none> 50000/TCP 84m
NAME READY AGE
statefulset.apps/jenkins 1/1 84m
any ideas anyone has i should checkout would be greatly appreciated.
I will try to dial my jenkins helm chart to the bare minimum to get it working again and keep this posting up to date with my trials and errors.
Upvotes: 0
Views: 1421
Reputation: 53
I ran a util pod and saw i could curl other pod services on the cluster just not jenkins. And then i brought up jenkins on a pet cluster and saw the curl command worked there.
so after restarting each node in my cluster the curl command worked...not sure what the issue was I wish I did. I was then able to launch the agents successfully.
pod command and output:
➜ kubernetes-jenkins git:(master) ✗ kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -n jenkins -- /bin/bash
If you don't see a command prompt, try pressing enter.
bash-5.0# curl http://jenkins.jenkins:8080/jenkins/tcpSlaveAgentListener/
Jenkins
Upvotes: 0