Determine the amount of time allocated to a batch job in SLURM

Question

The allocation time for a batch job can be specified on the sbatch command to slurm. For example, the following requests 1 day, 3 minutes and 10 seconds:

$ sbatch -t 1-0:3:10 test.sh

My script needs to know how long it will run so that it can save all its data before terminating. The environment variables available to the job as listed on the sbatch man page do not include the allocation time limit.

How can I determine this from within the script?

For now, I am asking the queue manager for the time limit on the current job:

#!/bin/sh
squeue -j $SLURM_JOB_ID -o "%l"

which gives

TIME_LIMIT
1-00:04:00

I parse the output using the following:

#!/bin/bash

TIMELIMIT=`squeue -j $SLURM_JOB_ID -o "%l" | tail -1`
echo Time limit $TIMELIMIT

if [[ $TIMELIMIT == *-* ]]; then
    IFS='-' read -ra DAYS_HOURS <<< $TIMELIMIT
    DAYS=${DAYS_HOURS[0]}
    PART_DAYS=${DAYS_HOURS[1]}
else
    DAYS=0
    PART_DAYS=$TIMELIMIT
fi
if [[ $PART_DAYS == *:*:* ]]; then
    IFS=':' read -ra HMS <<< $PART_DAYS
    H=${HMS[0]}
    M=${HMS[1]}
    S=${HMS[2]}
else
    IFS=':' read -ra HMS <<< $PART_DAYS
    H=0
    M=${HMS[0]}
    S=${HMS[1]}
fi

SECONDS=`echo "((($DAYS*24+$H)*60+$M)*60+$S)" | bc`
echo Time limit: $SECONDS seconds

HOURS=`echo "scale=3;((($DAYS*24+$H)*60+$M)*60+$S)/3600." | bc`
echo Time limit: $HOURS hours

which gives

Time limit 1-00:04:00
Time limit: 86404 seconds
Time limit: 24.001 hours

Is there a cleaner way to do this?

[Modified with correction given by Amit Ruhela 2022-05-17]

Following Telgar, here's a python script to receive USR1 signal:

import signal
import time
import sys

stop = False
def recv(signum, stack):
    global stop
    stop = True
    dt = time.time() - t0
    print("Receive signal {signum} at {dt:.1f}s".format(**locals()), stack)
    sys.stdout.flush()

t0 = time.time()
def main():
    n = int(sys.argv[1]) if len(sys.argv) > 1 else 6
    t = int(sys.argv[2]) if len(sys.argv) > 2 else 10
    print("Running for {n} steps of length {t}s:".format(**locals()))
    sys.stdout.flush()
    for k in range(n):
        time.sleep(t)
        dt = time.time() - t0
        print("- step {k} of {n} after {dt:.1f}s".format(**locals()))
        sys.stdout.flush()
        if stop: break
    if k < n:
        print("Stopped early.".format(**locals()))
        sys.stdout.flush()

handler = signal.signal(signal.SIGUSR1, recv)
main()

This can be run from a trivial batch script:

#!/bin/sh
srun work.py 9 23

which is placed on the queue with a run time of two minutes and a USR1 signal 60s before the end:

sbatch --signal=USR1@60 -t0:2:0 batch.sh

producing:

Running for 9 steps of length 23s:
- step 0 of 9 after 23.0s
- step 1 of 9 after 46.0s
Receive signal 10 at 56.0s 
- step 2 of 9 after 69.1s
Stopped early.

This doesn't use --signal=B:USR1@60 since the signal needs to go to the worker process not the batch script in my case. I didn't test, but all MPI workers should receive the warning as well, allowing them to abandon their current work and exit. Note that you should only use this technique on applications that trap USR1. If there is no signal handler then the default action is to terminate the process with the error "User defined signal 1".

Even easier:

sbatch --signal=INT@60 -t0:2:0 batch.sh

Then the python code is:

for k in range(n):
    try:
        ... # do work for step k
    except KeyboardInterrupt:
        print("end early")
        break

As well as being easier to write, it also stops the current iteration immediately so you have a predictable amount of time to save state.

Telgar · Accepted Answer

A few things.

If you use proctrack/cgroup, you can trap the SIGTERM signal that is sent when the time limit is up. That gives you a configurable amount of time to save state; SIGKILL is sent after KillWait seconds, configured in slurm.conf. However, it is difficult to make this work if you are using proctrack/linuxproc, because it sends SIGTERM to all processes, not just the bash script. Something like this:

#!/bin/bash
function sigterm {
    echo "SIGTERM"
    #save state
}
trap sigterm TERM

srun work.sh &

# This loop only breaks when all subprocesses exit
until wait; do :; done

This can be finicky to get right if you've never trapped signals in bash before. With proctrack/cgroup, SIGTERM is sent to the main process of each job step and the batch script. So above, work.sh would also have to trap SIGTERM. Also above, bash does not trap the signal until after subprocesses end unless you background them; hence the '&' and wait loop.

If you really want to pass the timelimit into the job, you could use an environment variable.

sbatch --export=ALL,TIMELIMIT=1-0:3:10 -t1-0:3:10 test.sh

Annoyingly, you have to specify the time limit twice.

Querying the controller with squeue isn't a terrible solution. At scale however, thousands of jobs querying the controller could impact performance. Note that you can use the --noheader flag to not print TIME_LIMIT every time, instead of using tail.

Basically, this is what KillWait was designed for, so you should consider using it unless you can't for some reason. https://slurm.schedmd.com/slurm.conf.html

The best answer might be use of the --signal option for sbatch. This allows you to send a configurable signal to your job a certain amount of time before the end of the time limit.

sbatch --signal=B:USR1@120 myscript.sh

The example above sends USR1 to the batch script about 2 minutes before the end of the job. As noted in the man page, the resolution on this is 60 seconds, so the signal could be sent up to 60 seconds early.

Determine the amount of time allocated to a batch job in SLURM

Answers (1)

Related Questions