Reputation: 494
I use a cluster which contains several nodes. Each them has 2 processors with 8 cores inside. I use Open MPI with SLURM.
My tests show that MPI Send/Recv data transfer rate is the following: between MPI process with rank 0 and MPI process 1 it's about 9 GB/sec, but between process 0 and process 2 it's 5 GB/sec. I assume that this happens because our processes execute on different processors.
I'd like to avoid non-local memory access. The recommendations I found here did not help. So the question is, is it possible to run 8 MPI processes - all on THE SAME processor? If it is - how do I do it?
Thanks.
Upvotes: 2
Views: 655
Reputation: 59110
With Slurm, set:
#SBATCH --ntasks=8
#SBATCH --ntasks-per-socket=8
to have all cores allocated on the same socket (CPU die), provided Slurm is correctly configured.
Upvotes: 0
Reputation: 74375
The following set of command-line options to mpiexec
should do the trick with versions of Open MPI before 1.7:
--by-core --bind-to-core --report-bindings
The last option will pretty-print the actual binding for each rank. Binding also activates some NUMA-awareness in the shared-memory BTL module.
Starting with Open MPI 1.7, processes are distributed round-robin over the available sockets and bound to a single core by default. To replicate the above command line, one should use:
--map-by core --bind-to core --report-bindings
Upvotes: 3
Reputation: 9062
You should look at the hostfile / rankfile documentation for your MPI library. Open MPI and MPICH both use different formats, but both will give you what you want.
Keep in mind that you will have performance issues if you oversubscribe your processor too heavily. Running more than 8 ranks on an 8 core processor will cause you to lose the performance benefits you gain from having locally shared memory.
Upvotes: 0
Reputation: 745
It appears to be possible. The Process Binding and Rankfiles sections of the OpenMPI mpirun man page look promising. I would try some of the options shown with the --report-binding option set so you can verify process placement is how you intend and see if you get the performance improvement you expect out of your code.
Upvotes: 0