Solving SLURM "sbatch: error: Batch job submission failed: Requested node configuration is not available" error

We have a 4 GPU nodes with 2 36-core CPUs and 200 GB of RAM available at our local cluster. When I'm trying to submit a job with the follwoing configuration:

#SBATCH --nodes=1
#SBATCH --ntasks=40
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1500MB
#SBATCH --gres=gpu:4
#SBATCH --time=0-10:00:00

I'm getting the following error:

sbatch: error: Batch job submission failed: Requested node configuration is not available

What might be the reason for this error? The nodes have exactly the kind of hardware that I need...

Upvotes: 14

Views: 49291

Answers (2)

damienfrancois
damienfrancois

Reputation: 59110

The CPUs are most likely 36-threads not 36-cores and Slurm is probably configured to allocate cores and not threads.

Check the output of scontrol show nodes to see what the nodes really offer.

Upvotes: 12

ciaron
ciaron

Reputation: 1159

You're requesting 40 tasks on nodes with 36 CPUs. The default SLURM configuration binds tasks to cores, so reducing the tasks to 36 or fewer may work. (Or increases nodes to 2, if your application can handle that)

Upvotes: 1

Related Questions