Reputation: 2989
First the background to this intriguing challenge. The continuous integration build can often have failures during development and testing of deadlocks, loops, or other issues that result in a never ending test. So all the mechanisms for notifying that a build has failed become useless.
The solution will be to have the build script timeout if there's zero output to the build log file for more than 5 minutes since the build routinely writes out the names of unit tests as it proceeds. So that's the best way to identify it's "frozen".
Okay. Now the nitty gritty...
The build server uses Hudson to run a simple bash script that invokes the more complex build script based on Nant and MSBuild (all on Windows).
So far all solutions around the net involve a timeout on the total run time of the command. But that solution fails in this case because the tests might hang or freeze in the first 5 minutes.
What we've thought of so far:
First, here's the high level bash command run the full test suite in Hudson.
build.sh clean free test
That command simply sends all the Nant and MSBuild build logging to stdout.
It's obvious that we need to tee that output to a file:
build.sh clean free test 2>&1 | tee build.out
Then in parallel a command needs to sleep, check the modify time of the file and if more than 5 minutes kill the main process. A kill -9
will be fine at that point--nothing graceful needed once it has frozen.
That's the part you can help with.
In fact, I made a script like this over 15 years ago to kill the connection with a data phone line to japan after periods of inactivity but can't remember how I did it.
Sincerely, Wayne
Upvotes: 0
Views: 1816
Reputation: 2989
Solved this myself by writing a bash script.
It's called iotimeout with one parameter which is the number of seconds.
You use it like this:
build.sh clean dev test | iotimeout 120
iotimeout has 2 loops.
One is a simple while read line loop that echos echo line but it also uses the touch command to update the modified time of a tmp file every time it writes a line. Unfortunately, it wasn't possible to monitor a build.out file because Windoze doesn't update the file modified time until you close the file. Oh well.
Another loop runs in the background, that's a forever loop which sleeps 10 seconds and then checks the modified time of the temp file. If that ever exceeds 120 seconds old then that loop forces the entire process group to exit.
The only tricky stuff was returning the exit code of the original program. Bash gives you a PIPESTATUS array to solve that.
Also, figuring out how to kill the entire program group was some research but turns out to be easy just--kill 0
Upvotes: 0
Reputation: 136256
build.sh clean free test 2>&1 | tee build.out &
sleep 300
kill -KILL %1
Upvotes: 1