Reputation: 437
We have a 4 GPU nodes with 2 36-core CPUs and 200 GB of RAM available at our local cluster. When I'm trying to submit a job with the follwoing configuration:
#SBATCH --nodes=1
#SBATCH --ntasks=40
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1500MB
#SBATCH --gres=gpu:4
#SBATCH --time=0-10:00:00
I'm getting the following error:
sbatch: error: Batch job submission failed: Requested node configuration is not available
What might be the reason for this error? The nodes have exactly the kind of hardware that I need...
Upvotes: 14
Views: 49291
Reputation: 59110
The CPUs are most likely 36-threads not 36-cores and Slurm is probably configured to allocate cores and not threads.
Check the output of scontrol show nodes
to see what the nodes really offer.
Upvotes: 12
Reputation: 1159
You're requesting 40 tasks on nodes with 36 CPUs. The default SLURM configuration binds tasks to cores, so reducing the tasks to 36 or fewer may work. (Or increases nodes to 2, if your application can handle that)
Upvotes: 1