d4v3mcc
d4v3mcc

Reputation: 11

How to continue jenkins pipeline after a host reboot.

I have a Jenkins job which is currently used for rebooting a host, this is a part of a pipeline and has several downstream jobs. Currently the job is rebooting and sleeping before starting the downstream build. Is there a better way within the job to check if the host is back up before continuing instead of using sleep?

Reboot_host job is currently executing:

ssh <hostname> "sudo reboot"
sleep 90

The host is a VM which is why the sleep duration is so short.

Upvotes: 1

Views: 4682

Answers (2)

Joerg S
Joerg S

Reputation: 5149

Assuming you're using Pipeline I can tell what we do for our Windows machines where we use a job to install Windows Updates and trigger a reboot.

The whole process involves several steps and a lot of error checking. As the code requires accessing the Jenkins API we put it into a global shared library where all the calls to the Jenkins API are encapsulated in @NonCPS methods. The following is just a rough sketch of what needs to be done - putting the full code here would be way too much.

Our process to reboot a Windows machine

To trigger a reboot on a linux machine you may not need all steps. But it should not harm to use them, though. Of course you have to implement proper error checking as well. I'd put the code in some library which can also be unit-tested.

  1. Poll Computer.countBusy() (for heavy-weight executors) and Computer.getOneOffExecutors() (for flyweight executors) in some loop from within a node block where you put the node offline (Computer.setTemporarilyOffline()) before polling. The computer you can get using getContext(hudson.FilePath).toComputer(). Once there's exactly one heavy-weight executor in use (that's us) and no flyweight executor the node is ready for reboot. Keep the node offline and stay within the node block for the second step.
  2. Trigger the reboot from within the node block opened in the first step while the Computer is still offline. Make sure to leave the node block immediately after running the reboot cmd. E.g. on Windows: bat 'shutdown /t 2 /r'
  3. Wait until the machine is no longer connected to detect the reboot. To achieve that we check whether we can get a valid FilePath for that Computer: Computer.getNode().createPath()
  4. Make sure to call Computer.disconnect(). At least for Windows machines this is very important as Jenkins sometimes won't notice that it lost the connection and would try to use the old connection - which would fail.
  5. Wait for the OS to boot. We use a linux node which pings for the Windows machine until the ping gets answered.
  6. Trigger a Computer.connect(false). Wait until the connection got re-established. We check Computer.getChannel() != null
  7. Put the node back online: Computer.setTemporarilyOffline(true, 'foo')
  8. Computer.waitUntilOnline()
  9. Done :)

Upvotes: 1

BOC
BOC

Reputation: 1699

I'm assuming you're using a Jenkinsfile here since you said "pipeline"; if not, please provide a bit more info on your job (freestyle with an execute shell, etc).

You're probably going to need sleep involved, but you can use it in conjunction with retry to give you faster success (and faster failure). Assuming you just need the VM to be up, you could use something like:

retry(20){
    sleep time: 5 unit: 'SECONDS'
    sh 'ssh -o ConnectTimeout=1 <hostname> exit'
}

This will try to ssh to the host every 5 seconds. Adding the ConnectTimeout means ssh will only wait for 1 second for the connection to complete. exit just ensures a successful connection is disconnected. retry will evaluate the commands up to 20 times until the sh command has a 0 (success) exit value. If it runs 20 times without succeeding, the build will fail (which is probably good, since that means your VM isn't available for the downstream jobs).

If there's a specific service you're waiting for, you could curl or otherwise make an attempt to contact that service instead of using ssh.

Upvotes: 0

Related Questions