Federico Marchetti
Federico Marchetti

Reputation: 3

Running MPI job on multiple nodes with slurm scheduler

I'm trying to run an MPI application with a specific task/node configuration. I need to run a total of 8 MPI tasks 4 of which on one node and 4 on another node.

This is the script file I'm using:

#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --nodes=2
#SBATCH --ntasks=8
#SBATCH --ntasks-per-node=4
#SBATCH --ntasks-per-socket=1
#SBATCH --cpus-per-task=4

module load autoload scalapack/2.0.2--intelmpi--2018--binary intel/pe-xe-2018--binary

srun <path_to_bin> <options>

I then run this with sbatch:

sbatch mpi_test.sh

but I continue to get this error:

sbatch: error: Batch job submission failed: Requested node configuration is not available

How can I modify this piece of code to make it run? I'm surely missing something, but I cannot figure what.

I'm using IntelMPI and slurm 20.02

Upvotes: 0

Views: 1622

Answers (1)

j23
j23

Reputation: 3530

This can be due to the wrong parameters.

Potential issue could be in the following lines:

#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=4

If there is not enough cpus that satisfy the requirement. ie. if there is less than 16cores in a single node the above error will be shown.

#SBATCH --ntasks-per-socket=1

As in the comment pointed out by damienfrancois, it can be an issue with the number of sockets. If there are no four sockets, then the same error will also be shown.

As a simple step, you can comment out "#SBATCH --ntasks-per-socket=1" the line and run the batch script. If it fails, then the issue can be due to the invalid mapping of tasks to cpu.

More information about the environment is needed for further analysis.

Upvotes: 1

Related Questions