wonder
wonder

Reputation: 903

viewing user requested slurm job priority

I submitted the below job in slurm

testuser1@dev-0:~$ sbatch --priority=10 --cpus-per-task=10 --wrap="/bin/sleep 300"
Submitted batch job 18

When I do show job on the above one, I don't see the above submitted priority value in scontrol show job.

testuser1@dev-0:~$ scontrol show job 18
JobId=18 JobName=wrap
UserId=testuser1(1000) GroupId=tstgrp00(1000) MCS_label=N/A
Priority=4294901751 Nice=0 Account=(null) QOS=(null)
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:15 TimeLimit=365-00:00:00 TimeMin=N/A
SubmitTime=2023-10-03T09:59:44 EligibleTime=2023-10-03T09:59:44
AccrueTime=2023-10-03T09:59:44
StartTime=2023-10-03T09:59:44 EndTime=2024-10-02T09:59:44 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-10-03T09:59:44 Scheduler=Backfill
Partition=debug AllocNode:Sid=dev-0:116
ReqNodeList=(null) ExcNodeList=(null)
NodeList=dev-0
BatchHost=dev-0
NumNodes=1 NumCPUs=10 NumTasks=1 CPUs/Task=10 ReqB:S:C:T=0:0:*:*
ReqTRES=cpu=10,mem=10M,node=1,billing=10
AllocTRES=cpu=10,mem=10M,node=1,billing=10
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=10 MinMemoryCPU=1M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/home/testuser1
StdErr=/home/testuser1/slurm-18.out
StdIn=/dev/null
StdOut=/home/testuser1/slurm-18.out
Power=

The scheduler type in my slurm.conf is sched/backfill. Although the jobs are being scheduled as per the user requested priority but I don't see the exact value requested by user in scontrol show job. I am seeing this value even with sacct. Is there a way to view the original value in scontrol?

Upvotes: 0

Views: 542

Answers (1)

j23
j23

Reputation: 3530

If your printed priorty value is say X,

then the actual priority can be calculated as

 UINT_MAX - UINT16_MAX + 1 - X

So, in your case, it will be,

 UINT_MAX - UINT16_MAX + 1 - 4294901751 

can be translated as

 4294967295 - 65535 + 1 - 4294901751 #will give 10

Slurm sets the priority in unsigned integers and when displaying the result, (FOLLOWING IS MY ASSUMPTION - I didn't check the entire code :)) the translation (format specifiers/different types in different data structures for storing job information) messes (unsigned int and unsigned short) and priority calculation is shown as large values in the output (or maybe they intended it to be in such a way). Hence, doing the reverse engineering scenario mentioned above can get you the output.

MAX Values:
    UNIT_MAX 4294967295
    UINT16_MAX 65535

This is my assumption!

Upvotes: 0

Related Questions