duhaime
duhaime

Reputation: 27611

SLURM: Restart worker after the worker completes

I'd like to create an array of SLURM workers, and whenever one of those workers finishes its work, I'd like to restart the worker.

If it were possible to run jobs of infinite duration on my queue, I'd of course do that that instead, but because this isn't possible, I thought I'd just create an infinite series of workers.

Is this possible in SLURM? I thought I could submit an sbatch command from inside the last worker in my worker array to just restart the entire sequence, but the compute nodes that workers run on in my cluster don't have access to the sbatch callable.

Any pointers on this question would be super helpful!

Upvotes: 0

Views: 745

Answers (2)

damienfrancois
damienfrancois

Reputation: 59260

To complement @Marcus Boden's answer: Many people setup a CRON job on the login node to periodically test the queue status and resubmit jobs if necessary. A scrontab command might be made available in a future Slurm release to help with this use case.

Upvotes: 2

Marcus Boden
Marcus Boden

Reputation: 1685

There is no way built in to slurm to do that. There may still be some tricks to get around this, depending on your cluster. Before you try this, please talk to your cluster admins first, as these are rather hacky and there may be a reason why your admins decided to not make sbatch available on your compute nodes.

  1. ssh into a node where sbatch is available and resubmit the job from there. This depends on your clusters ssh setup.
  2. Copy the sbatch binary to your home directory (or any dir that you have access to on the node) and use it there. This depends on the setup of slurm, firewalls and more.
  3. run a program on the frontend that periodically checks if your jobs are still running and resubmit it if not. Some clusters automatically kill all user processes on the frontends, once the last login shell closes, so that won't work in that case.

But to reiterate: Please ask your sysadmins first! We generally won't bite your head off.

Upvotes: 2

Related Questions