Reputation: 27611
I'd like to create an array of SLURM workers, and whenever one of those workers finishes its work, I'd like to restart the worker.
If it were possible to run jobs of infinite duration on my queue, I'd of course do that that instead, but because this isn't possible, I thought I'd just create an infinite series of workers.
Is this possible in SLURM? I thought I could submit an sbatch
command from inside the last worker in my worker array to just restart the entire sequence, but the compute nodes that workers run on in my cluster don't have access to the sbatch
callable.
Any pointers on this question would be super helpful!
Upvotes: 0
Views: 745
Reputation: 59260
To complement @Marcus Boden's answer: Many people setup a CRON job on the login node to periodically test the queue status and resubmit jobs if necessary. A scrontab
command might be made available in a future Slurm release to help with this use case.
Upvotes: 2
Reputation: 1685
There is no way built in to slurm to do that. There may still be some tricks to get around this, depending on your cluster. Before you try this, please talk to your cluster admins first, as these are rather hacky and there may be a reason why your admins decided to not make sbatch available on your compute nodes.
But to reiterate: Please ask your sysadmins first! We generally won't bite your head off.
Upvotes: 2