Reputation: 461
For the last month or so, two of our windows slaves (connected via JNLP) have started to frequently disconnect. I am pretty sure that something changed in terms of our network, since this only affects one geographical location (and all the slaves in that location) and there was a distinct time when we started receiving the node offline emails, but so far our infrastructure team has drawn a blank.
The error I am seeing in the slave logs is:
JNLP agent connected from xx
Slave.jar version: 3.4.1
This is a Windows agent
Agent successfully connected and online
ERROR: Connection terminated
[8mha:////4EVWKVufSpoBsjG/AK97kvCQst6o1LLM9fjogkB0XVcIAAAAWB+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=[0mjava.nio.channels.ClosedChannelException
at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)
at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:721)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
I have spent hours trying to figure out what is wrong. I am not sure what logger I can configure to try and diagnose this further, so if anyone can help me there it would be appreciated? Or indeed any other way of diagnosing what has gone awry.
We are running the latest LTS release 2.46.1 (but it was exhibiting this problem on an older LTS release, and I upgraded to see if the recent remoting changes helped, which unfortunately they didn't.)
I suspect the problem may be on the master side, since I can connect to another master from the same slave machine and don't see the disconnects.
Any help would be appreciated since I am all out of ideas.
thanks, Stu
Upvotes: 4
Views: 7386
Reputation: 1958
I had similar issue with AWS ECS slave agents, for some of failing builds advise in this article helped.
Try to add -Dhudson.remoting.Launcher.pingIntervalSec=-1
to slave JVM options and execute following in the master console
Jenkins.instance.injector.getInstance(hudson.slaves.ChannelPinger.class).@pingIntervalSeconds = -1
Jenkins.instance.injector.getInstance(hudson.slaves.ChannelPinger.class).@pingTimeoutSeconds = -1
Upvotes: 1
Reputation: 23
It might be a certificate error.
Open up the jenkins-slave.xml file, add the argument -noCertificateCheck
<arguments>-Xrs -jar "%BASE%\slave.jar" -jnlpUrl <master jenkins server url/slave-agent.jnlp -secret <secret> -noCertificateCheck</arguments>
Restart the service, and check if the problem persists.
Upvotes: 0