Hayden Oliver
Hayden Oliver

Reputation: 1

Slurm job array submission severely underutilizing available resources

The SLURM job array submission isn't working as I expected. When I run my sbatch script to create the array and run the programs I expect it to fully utilize all the cores that are available, however, it only allows one job from the array to run on the a given node at a time. SCONTROL shows the job using all 36 cores on the node when I specified 4 cores for the process. Additionally, I want to restrict the jobs to running on one specific node, however if other nodes are unused, it will submit a job onto them as well, using every core available on that node.

I've tried submitting the jobs by changing the parameters for --nodes, --ntasks, --nodelist, --ntasks-per-node, --cpus-per-task, setting OMP_NUM_THREADS, and specifying the number of cores for mpirun directly. None of these options seemed to change anything at all.

#!/bin/bash
#SBATCH --time=2:00:00   # walltime
#SBATCH --ntasks=1   # number of processor cores (i.e. tasks)
#SBATCH --nodes=1    # number of nodes
#SBATCH --nodelist node001
#SBATCH --ntasks-per-node=9
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=500MB   # memory per CPU core

#SBATCH --array=0-23%8

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

mpirun -n 4 MYPROGRAM

I expected to be able to run eight instances of MYPROGRAM, each utilizing four cores for a parallel operation. In total, I expected to use 32 cores at a time for MYPROGRAM, plus however many cores are needed to run the job submission program.

Instead, my squeue output looks like this

JOBID          PARTITION    NAME      USER   ST   TIME  NODES CPUS
  num_[1-23%6]  any      MYPROGRAM   user   PD   0:00      1 4
  num_0         any      MYPROGRAM   user    R   0:14      1 36

It says that I am using all available cores on the node for this process, and will not allow additional array jobs to begin. While MYPROGRAM runs exactly as expected, there is only once instance of it running at any given time.

And my SCONTROL output looks like this:

   UserId=user(225589) GroupId=domain users(200513) MCS_label=N/A
   Priority=4294900562 Nice=0 Account=(null) QOS=normal
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=02:00:00 TimeMin=N/A
   SubmitTime=2019-06-21T18:46:25 EligibleTime=2019-06-21T18:46:26
   StartTime=Unknown EndTime=Unknown Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2019-06-21T18:46:28
   Partition=any AllocNode:Sid=w***:45277
   ReqNodeList=node001 ExcNodeList=(null)
   NodeList=(null) SchedNodeList=node001
   NumNodes=1-1 NumCPUs=4 NumTasks=1 CPUs/Task=4 ReqB:S:C:T=0:0:*:*
   TRES=cpu=4,mem=2000M,node=1
   Socks/Node=* NtasksPerN:B:S:C=9:0:*:* CoreSpec=*
   MinCPUsNode=36 MinMemoryCPU=500M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)

   Power=

JobId=1694 ArrayJobId=1693 ArrayTaskId=0 JobName=launch_vasp.sh
   UserId=user(225589) GroupId=domain users(200513) MCS_label=N/A
   Priority=4294900562 Nice=0 Account=(null) QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:10 TimeLimit=02:00:00 TimeMin=N/A
   SubmitTime=2019-06-21T18:46:25 EligibleTime=2019-06-21T18:46:26
   StartTime=2019-06-21T18:46:26 EndTime=2019-06-21T20:46:26 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2019-06-21T18:46:26
   Partition=any AllocNode:Sid=w***:45277
   ReqNodeList=node001 ExcNodeList=(null)
   NodeList=node001
   BatchHost=node001
   NumNodes=1 NumCPUs=36 NumTasks=1 CPUs/Task=4 ReqB:S:C:T=0:0:*:*
   TRES=cpu=36,mem=18000M,node=1,billing=36
   Socks/Node=* NtasksPerN:B:S:C=9:0:*:* CoreSpec=*
   MinCPUsNode=36 MinMemoryCPU=500M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)

   Power=

Something is going wrong in how SLURM is assigning cores to tasks, but nothing I've tried changes anything. I'd appreciate any help you can give.

Upvotes: 0

Views: 1087

Answers (1)

Aaron Caba
Aaron Caba

Reputation: 73

Check if the slurm.conf file allows consumable resources. The default is to assign nodes exclusively. I had to add the following lines to allow per-score scheduling

SelectType=select/cons_res
SelectTypeParameters=CR_Core

Upvotes: 0

Related Questions