Reputation: 2163
The allocation time for a batch job can be specified on the sbatch command to slurm. For example, the following requests 1 day, 3 minutes and 10 seconds:
$ sbatch -t 1-0:3:10 test.sh
My script needs to know how long it will run so that it can save all its data before terminating. The environment variables available to the job as listed on the sbatch man page do not include the allocation time limit.
How can I determine this from within the script?
For now, I am asking the queue manager for the time limit on the current job:
#!/bin/sh
squeue -j $SLURM_JOB_ID -o "%l"
which gives
TIME_LIMIT
1-00:04:00
I parse the output using the following:
#!/bin/bash
TIMELIMIT=`squeue -j $SLURM_JOB_ID -o "%l" | tail -1`
echo Time limit $TIMELIMIT
if [[ $TIMELIMIT == *-* ]]; then
IFS='-' read -ra DAYS_HOURS <<< $TIMELIMIT
DAYS=${DAYS_HOURS[0]}
PART_DAYS=${DAYS_HOURS[1]}
else
DAYS=0
PART_DAYS=$TIMELIMIT
fi
if [[ $PART_DAYS == *:*:* ]]; then
IFS=':' read -ra HMS <<< $PART_DAYS
H=${HMS[0]}
M=${HMS[1]}
S=${HMS[2]}
else
IFS=':' read -ra HMS <<< $PART_DAYS
H=0
M=${HMS[0]}
S=${HMS[1]}
fi
SECONDS=`echo "((($DAYS*24+$H)*60+$M)*60+$S)" | bc`
echo Time limit: $SECONDS seconds
HOURS=`echo "scale=3;((($DAYS*24+$H)*60+$M)*60+$S)/3600." | bc`
echo Time limit: $HOURS hours
which gives
Time limit 1-00:04:00
Time limit: 86404 seconds
Time limit: 24.001 hours
Is there a cleaner way to do this?
[Modified with correction given by Amit Ruhela 2022-05-17]
Following Telgar, here's a python script to receive USR1 signal:
import signal
import time
import sys
stop = False
def recv(signum, stack):
global stop
stop = True
dt = time.time() - t0
print("Receive signal {signum} at {dt:.1f}s".format(**locals()), stack)
sys.stdout.flush()
t0 = time.time()
def main():
n = int(sys.argv[1]) if len(sys.argv) > 1 else 6
t = int(sys.argv[2]) if len(sys.argv) > 2 else 10
print("Running for {n} steps of length {t}s:".format(**locals()))
sys.stdout.flush()
for k in range(n):
time.sleep(t)
dt = time.time() - t0
print("- step {k} of {n} after {dt:.1f}s".format(**locals()))
sys.stdout.flush()
if stop: break
if k < n:
print("Stopped early.".format(**locals()))
sys.stdout.flush()
handler = signal.signal(signal.SIGUSR1, recv)
main()
This can be run from a trivial batch script:
#!/bin/sh
srun work.py 9 23
which is placed on the queue with a run time of two minutes and a USR1 signal 60s before the end:
sbatch --signal=USR1@60 -t0:2:0 batch.sh
producing:
Running for 9 steps of length 23s:
- step 0 of 9 after 23.0s
- step 1 of 9 after 46.0s
Receive signal 10 at 56.0s <frame object at 0x7f33671185e8>
- step 2 of 9 after 69.1s
Stopped early.
This doesn't use --signal=B:USR1@60
since the signal needs to go to the worker process not the batch script in my case. I didn't test, but all MPI workers should receive the warning as well, allowing them to abandon their current work and exit. Note that you should only use this technique on applications that trap USR1. If there is no signal handler then the default action is to terminate the process with the error "User defined signal 1".
Even easier:
sbatch --signal=INT@60 -t0:2:0 batch.sh
Then the python code is:
for k in range(n):
try:
... # do work for step k
except KeyboardInterrupt:
print("end early")
break
As well as being easier to write, it also stops the current iteration immediately so you have a predictable amount of time to save state.
Upvotes: 3
Views: 2366
Reputation: 173
A few things.
If you use proctrack/cgroup, you can trap the SIGTERM signal that is sent when the time limit is up. That gives you a configurable amount of time to save state; SIGKILL is sent after KillWait seconds, configured in slurm.conf. However, it is difficult to make this work if you are using proctrack/linuxproc, because it sends SIGTERM to all processes, not just the bash script. Something like this:
#!/bin/bash
function sigterm {
echo "SIGTERM"
#save state
}
trap sigterm TERM
srun work.sh &
# This loop only breaks when all subprocesses exit
until wait; do :; done
This can be finicky to get right if you've never trapped signals in bash before. With proctrack/cgroup, SIGTERM is sent to the main process of each job step and the batch script. So above, work.sh would also have to trap SIGTERM. Also above, bash does not trap the signal until after subprocesses end unless you background them; hence the '&' and wait loop.
If you really want to pass the timelimit into the job, you could use an environment variable.
sbatch --export=ALL,TIMELIMIT=1-0:3:10 -t1-0:3:10 test.sh
Annoyingly, you have to specify the time limit twice.
Querying the controller with squeue
isn't a terrible solution. At scale however, thousands of jobs querying the controller could impact performance. Note that you can use the --noheader
flag to not print TIME_LIMIT every time, instead of using tail
.
Basically, this is what KillWait was designed for, so you should consider using it unless you can't for some reason. https://slurm.schedmd.com/slurm.conf.html
The best answer might be use of the --signal
option for sbatch
. This allows you to send a configurable signal to your job a certain amount of time before the end of the time limit.
sbatch --signal=B:USR1@120 myscript.sh
The example above sends USR1 to the batch script about 2 minutes before the end of the job. As noted in the man page, the resolution on this is 60 seconds, so the signal could be sent up to 60 seconds early.
Upvotes: 2