pau
pau

Reputation: 31

Jenkins EC2 Windows Slave agents with time-outs

I have Jenkins infrastructure with one master and 4 slaves hosted in local CPD working fine. I need to add more windows slave agents and for this purpose I installed de Amazon EC2 Plugin for Jenkins. When I launch a new EC2 node from Jenkins master and the java agent is launched it reports a lot of TimeoutExceptions.

EC2 (AMAZON-EU-WEST-1) - Jenkins Compilation  (i-0f8c1b63a843e4602) booted at 1579938111000
Connecting to (10.180.3.133) with WinRM as automatictv
Waiting for WinRM to come up. Sleeping 10s.
Waiting for WinRM to come up. Sleeping 10s.
Waiting for WinRM to come up. Sleeping 10s.
WinRM service responded. Waiting for WinRM service to stabilize on EC2 (AMAZON-EU-WEST-1) - Jenkins Compilation  (i-0f8c1b63a843e4602)
WinRM should now be ok on EC2 (AMAZON-EU-WEST-1) - Jenkins Compilation  (i-0f8c1b63a843e4602)
Connected with WinRM.
Creating tmp directory if it does not exist
init script ran successfully
remoting.jar sent remotely. Bootstrapping it
Launching via WinRM:java  -jar C:\Windows\Temp\remoting.jar -workDir c:\jenkins
<===[JENKINS REMOTING CAPACITY]===>Remoting version: 3.36
This is a Windows agent
ERROR: ERROR: Failed to monitor for Response Time
Failed to monitor for Free Disk Space
ERROR: Failed to monitor for Free Swap Space
java.util.concurrent.TimeoutException
    at hudson.remoting.Request$1.get(Request.java:316)
    at hudson.remoting.Request$1.get(Request.java:240)
    at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
    at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
    at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
    at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
java.util.concurrent.TimeoutException
    at hudson.remoting.Request$1.get(Request.java:316)
    at hudson.remoting.Request$1.get(Request.java:240)
    at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
    at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
    at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
    at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
java.util.concurrent.TimeoutException
    at hudson.remoting.Request$1.get(Request.java:316)
    at hudson.remoting.Request$1.get(Request.java:240)
    at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
    at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
    at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:57)
    at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
ERROR: Failed to monitor for Free Temp Space
java.util.concurrent.TimeoutException
    at hudson.remoting.Request$1.get(Request.java:316)
    at hudson.remoting.Request$1.get(Request.java:240)
    at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
    at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
    at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
    at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
Agent successfully connected and online

The master agent logs also reports time out exceptionspl every 18s.

Jan 25 09:11:20 localhost docker/jenkins[1345]: 2020-01-25 08:11:20.586+0000 [id=137]#011WARNING#011h.plugins.ec2.win.WinConnection#ping: Failed to verify connectivity to Windows slave
Jan 25 09:11:20 localhost docker/jenkins[1345]: java.net.SocketTimeoutException: connect timed out
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at java.net.PlainSocketImpl.socketConnect(Native Method)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at java.net.Socket.connect(Socket.java:589)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at hudson.plugins.ec2.win.WinConnection.ping(WinConnection.java:108)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at hudson.plugins.ec2.win.EC2WindowsLauncher.connectToWinRM(EC2WindowsLauncher.java:169)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at hudson.plugins.ec2.win.EC2WindowsLauncher.launchScript(EC2WindowsLauncher.java:39)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:48)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:290)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at java.util.concurrent.FutureTask.run(FutureTask.java:266)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Jan 25 09:11:20 localhost docker/jenkins[1345]: #011at java.lang.Thread.run(Thread.java:748)

If I launch a new job on this agent it runs very slowly. For example, a simple echo script that in local agents runs in less than 1 seconds, in the EC2 Windows Slave agent it takes 10 seconds, seems a timeout issue to.

I have installed a Jenkins Version 2.204.1 with Amazon EC2 plugin version 1.49.

Upvotes: 3

Views: 2346

Answers (1)

adam
adam

Reputation: 921

As Jenkins and the Plugins not always developed under the same roof, here is what I suggest to anyone that want to debug and fix it

Debug:

Start the ec2 windows, and go to the dir: C:\Windows\Temp\remoting\logs - read the logs, it will help a lot finding the windows issue root cause

Firewall issues (internal), few options:

Access needed from the Jenkins master, to the agent.

  • Close Windows AMI that has a non working firewall or specific roles
  • Jenkins Cloud init scrit, add something like: powershell -Command "New-NetFirewallRule -DisplayName 'Allow OpenTelemetry' -Direction Inbound -Protocol TCP -LocalPort 4317 -Action Allow"

Plugins issues:

in my case I had issues with opentelemetry, I have disabled it from Jenkins, manage, plugins.

Cloud config issues JVM settings could be related to this, and the size of your instance (-Xms2048m -Xmx4096m)

Last words:

  • Sometimes we have issues with our Jenkins Agents, so we are going to check the logs, and this is the memory issue could be misleading because it this error was always there.
  • You can still have this issue, depends on the OS. Maybe Jenkins can't monitor Windows or Mac as agents, but the steps above related to connectivity issues. Hope this helps.

Upvotes: 0

Related Questions