user321627
user321627

Reputation: 2572

I have a job in a SLURM cluster that stopped and now says "PREEMPTED", what does that mean?

I ran a job in a SLURM cluster, and for a while, the job was running just fine. The last time I used the queue command squeue it reported:

JOBID   PARTITION NAME     USER    ST     TIME  NODES NODELIST(REASON)
2394852 serial_re CombineP user_1  R      22:29 1     bigcluster112

However, I just checked it and it now says:

JOBID   PARTITION NAME     USER    ST     TIME  NODES NODELIST(REASON)
2394852 serial_re CombineP user_1  PD     0:00      1 (Priority)

and I got an email saying the job has been "PREEMPTED". I searched online and it says that when there is a high priority job, the low priority one will stop while the high priority one runs. This is on a shared university cluster. I didn't run any other jobs. Does this mean someone else just ran a job that now put mine into a low priority one? How does one set or beat that priority? Thanks!

Upvotes: 3

Views: 5435

Answers (1)

damienfrancois
damienfrancois

Reputation: 59260

Yes someone submitted a job with a higher priority, or with a QOS that has preemption rights over other QOSes, or to a partition that has preemption rights over other partitions.

Look for the word 'Preempt' in the output of scontrol show config, scontrol show partitions and sacctmgr list qos for more information.

To know how the priority is computed, have a look a the output of scontrol show config | grep Priority and look for the corresponding keywords in the slurm.conf manpage.

Upvotes: 3

Related Questions