Reputation: 903
I submitted the below job in slurm
testuser1@dev-0:~$ sbatch --priority=10 --cpus-per-task=10 --wrap="/bin/sleep 300"
Submitted batch job 18
When I do show job on the above one, I don't see the above submitted priority value in scontrol show job.
testuser1@dev-0:~$ scontrol show job 18
JobId=18 JobName=wrap
UserId=testuser1(1000) GroupId=tstgrp00(1000) MCS_label=N/A
Priority=4294901751 Nice=0 Account=(null) QOS=(null)
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:15 TimeLimit=365-00:00:00 TimeMin=N/A
SubmitTime=2023-10-03T09:59:44 EligibleTime=2023-10-03T09:59:44
AccrueTime=2023-10-03T09:59:44
StartTime=2023-10-03T09:59:44 EndTime=2024-10-02T09:59:44 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-10-03T09:59:44 Scheduler=Backfill
Partition=debug AllocNode:Sid=dev-0:116
ReqNodeList=(null) ExcNodeList=(null)
NodeList=dev-0
BatchHost=dev-0
NumNodes=1 NumCPUs=10 NumTasks=1 CPUs/Task=10 ReqB:S:C:T=0:0:*:*
ReqTRES=cpu=10,mem=10M,node=1,billing=10
AllocTRES=cpu=10,mem=10M,node=1,billing=10
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=10 MinMemoryCPU=1M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/home/testuser1
StdErr=/home/testuser1/slurm-18.out
StdIn=/dev/null
StdOut=/home/testuser1/slurm-18.out
Power=
The scheduler type in my slurm.conf is sched/backfill. Although the jobs are being scheduled as per the user requested priority but I don't see the exact value requested by user in scontrol show job
. I am seeing this value even with sacct. Is there a way to view the original value in scontrol?
Upvotes: 0
Views: 542
Reputation: 3530
If your printed priorty value is say X
,
then the actual priority can be calculated as
UINT_MAX - UINT16_MAX + 1 - X
So, in your case, it will be,
UINT_MAX - UINT16_MAX + 1 - 4294901751
can be translated as
4294967295 - 65535 + 1 - 4294901751 #will give 10
Slurm sets the priority in unsigned integers and when displaying the result, (FOLLOWING IS MY ASSUMPTION - I didn't check the entire code :)) the translation (format specifiers/different types in different data structures for storing job information) messes (unsigned int and unsigned short) and priority calculation is shown as large values in the output (or maybe they intended it to be in such a way). Hence, doing the reverse engineering scenario mentioned above can get you the output.
MAX Values:
UNIT_MAX 4294967295
UINT16_MAX 65535
This is my assumption!
Upvotes: 0