Reputation: 11
I'm setting up Slurm on a cluster of Raspberry Pi 4's. I have succeeded in configuring and using Slurm on my 24 RPi cluster and allowing 4 MPI tasks per RPi. So, if I make an MPI run (either using "srun" or "sbatch" (and a batch script)) with all the nodes (-N 24) and all the cores (-n 96) it works as I would expect.
An example of a typical "slurm.conf" line for a RPi node is:
NodeName=foo-001 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
There are 24 nodes entries of this form. The overall partition entry is:
PartitionName=general Nodes=foo-[001-024] Default=YES MaxTime=INFINITE State=UP
I would like to run more than 4 MPI tasks per RPi (to test overloading the cores). I haven't been able to set up the Slurm configuration to do this.
I don't want to multiple threads per MPI task. I want to be able to set up Slurm so that it allows 8 or 16 MPI tasks per node for a total of 192 or 384 MPI tasks, respectively.
Here is my overall "slurm.conf" file:
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
SlurmctldHost=foo-001
#
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/run/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/run/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/affinity
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=tjl-pi-pharm
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
#SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
#SlurmdDebug=info
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
#
# ARRAY LIMITS
MaxArraySize=100000
MaxJobCount=1000000
#
#
# COMPUTE NODES
NodeName=foo-001 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-002 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-003 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-004 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-005 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-006 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-007 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-008 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-009 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-010 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-011 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-012 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-013 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-014 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-015 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-016 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-017 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-018 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-019 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-020 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-021 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-022 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-023 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
NodeName=foo-024 CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
PartitionName=general Nodes=foo-[001-024] Default=YES MaxTime=INFINITE State=UP
I'm adding some additional information to clarify what I'm trying to achieve.
For my current work, I'm using the Slurm "array" capability. For example, the start of my batch script (submitted using "sbatch") is:
#!/bin/bash
#SBATCH -J bb
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 1
#SBATCH -a 0-9970%96
#SBATCH -t 4:00:00
#SBATCH -o enum-%A_%a.txt
...
This works and causes the "..." in the script to be executed on all 96 cores independently (this is an embarrassingly parallel set of computations). I'd like to replace "%96" with "%192" and have 192 jobs to be run on 96 cores simultaneously. This doesn't currently happen with my Slurm configuration. Instead, Slurm still runs 96 jobs and fills in as these complete. I have tried the "-O"/"--overcommit" flag to "sbatch" and it doesn't seem to change the behavior.
Upvotes: 1
Views: 813
Reputation: 1685
Instead of changing your Slurm configuration, you could instruct Slurm to allocate multiple tasks per core. In the allocation (i.e. the jobscript) you can add --ntasks-per-core=4
and start the MPI program with the srun parameter --overcommit
.
Example jobscript:
#!/bin/bash
[...]
#SBATCH -N 24
#SBATCH --ntasks-per-core=4
#SBATCH -n 96
[...]
srun --overcommit -n 384 ./your-prog
Upvotes: 0