Reputation: 313
Is there a way to automatically kill an inactive job in slurm? Or put it in another way, auto extend time limit of the job as long as it is consuming reasonable CPU or interactive with user? Such as by some kind of settings in slurm.conf
.
The use case is that during an interactive session (srun --pty
), I want to keep the session within the time limit as long as I'm operating on it. But if I didn't operate on it for say 4-hours, then it is safe to kill the session (i.e. the job).
Upvotes: 3
Views: 1204
Reputation: 59110
Slurm does not have a feature directly implementing that but you could rely on the Bash TMOUT
mechanism.
TMOUT
is an environment variable that you can set to the number of seconds for the prompt to wait for input before terminating the shell. Practically, setting for instance export TMOUT=60
at the beginning of a Bash interactive session will abort the session whenever there is no command enters for 60 seconds.
[user@cluster ~]$ srun --pty bash
srun: job 11111111 queued and waiting for resources
srun: job 11111111 has been allocated resources
[user@node024 ~]$ export TMOUT=10
[user@node024 ~]$ echo "Let's wait doing nothing"
Let's wait doing nothing
[user@node024 ~]$ timed out waiting for input: auto-logout
[user@cluster ~]$
If you are the admin and want to enforce this onto the users, you can use task prolog to inject the TMOUT
variable into the job shell environment. /!\ Make sure to test that idea thoroughly before implementing it for the users.
Upvotes: 2