AksR
AksR

Reputation: 55

Is it possible to modify or add layer to SLURM scheduling

I am a non-paying user on a computing cluster that uses SLURM.

Occasionally, I've had long-running and multiple jobs that clogged up the squeue for paying users. Due to this I've had jobs cancelled by admin. Currently I've had a cap on the number of nodes that are available to me. While I dont argue with the equity of this arrangement , this is a problem for me in terms of getting work done, especially because I see free nodes that are not running any jobs, while I just sit waiting for jobs to pass through the node cap....

With that as background info, here are my two questions:

  1. Isnt it possible for admin to suspend, and then resume jobs - either a job, or all jobs of a user, or a set of jobs? Is this suspend / resume onerous from the admin's perspective?

  2. I suppose it should be possible to create a list of paying Vs non-paying users. And when paying username submits with sbatch to automatically instruct SLURM to suspend non-paying username's job or jobs, and resume when paid user's jobs have completed. Is this even possible? IF yes, is it outside the skill scope of regular SLURM / Farm admins?

Could someone please suggest any other solutions (if what I have asked above are unreasonable or absurd)?

Thank you!

Upvotes: 1

Views: 261

Answers (1)

damienfrancois
damienfrancois

Reputation: 59360

  1. The admin can run scontrol suspend jobid and then scontrol resume jobid

  2. The keywords here are 'QOS' and 'preemption'. Typically a QOS is created for the paying users, that has preemptive rights over the normal QOS. Jobs of the non-paying users can be cancelled, checkpointed, requeued, or suspended.

Upvotes: 0

Related Questions