Mayank Jain
Mayank Jain

Reputation: 2564

Slurm : cannot allocate resources even when they are available

I am trying to execute a socket programming code on a cluster using SLURM for node allocation. I used slurm script as below:

#!/bin/bash
#SBATCH --job-name="abcd"
#SBATCH --ntasks=2
#SBATCH --nodes=2-2
#SBATCH --cpus-per-task=128
#SBATCH --partition=knl
./a.out

When running this as sbatch script I get an error "sbatch: error: Batch job submission failed: Requested node configuration is not available".

However, I do see some nodes satisfying above config. scontrol output for two nodes shown below:

NodeName=compute140 Arch=x86_64 CoresPerSocket=64
   CPUAlloc=20 CPUErr=0 CPUTot=256 CPULoad=20.01
   AvailableFeatures=knl
   ActiveFeatures=knl
   Gres=(null)
   NodeAddr=compute140 NodeHostName=compute140 Version=16.05
   OS=Linux RealMemory=96000 AllocMem=81920 FreeMem=102580 Sockets=1 Boards=1
   MemSpecLimit=1024
   State=MIXED ThreadsPerCore=4 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   BootTime=2018-06-04T12:41:22 SlurmdStartTime=2018-06-04T12:47:01
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


NodeName=compute141 Arch=x86_64 CoresPerSocket=64
   CPUAlloc=20 CPUErr=0 CPUTot=256 CPULoad=20.01
   AvailableFeatures=knl
   ActiveFeatures=knl
   Gres=(null)
   NodeAddr=compute141 NodeHostName=compute141 Version=16.05
   OS=Linux RealMemory=96000 AllocMem=81920 FreeMem=87441 Sockets=1 Boards=1
   MemSpecLimit=1024
   State=MIXED ThreadsPerCore=4 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   BootTime=2018-06-04T12:46:37 SlurmdStartTime=2018-06-04T12:52:11
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

I am not sure why am I getting the error when slurm should allocate me the requested config.

I want to run client-server application on two different knl nodes each task would be multithreaded with 128 threads per task.

Please help as I tried several things but nothing is working for me.

Upvotes: 5

Views: 4963

Answers (1)

damienfrancois
damienfrancois

Reputation: 59250

You do not specify explicitly the memory requirement per CPU, so the default applies. If the default is larger than RealMemory/CPUTot, in your case 96000MB/128=750MB, then the tasks cannot hold in one single node.

So if the default is 4GB/CPU, and you request one task per node and 128CPUs per tasks, you effectively request 524GB of RAM per node, which your cluster cannot offer.

Upvotes: 3

Related Questions