Is MPI_Igather thread-safe?

Question

I am trying to start a sequence of MPI_Igather calls (non-blocking collectives from MPI 4), do some work, and then whenever an Igather finishes, do some more work on that data.

That works fine, unless I start the Igather's from different threads on each MPI rank. In that case I often get a deadlock, even though I call MPI_Init_thread to make sure that MPI_THREAD_MULTIPLE is provided. Non-blocking collectives do not have a tag to match sends and receives, but I thought this is handled by the MPI_Request object associated with each collective operation?

The most simple example I found failing is this:

given np MPI ranks, each of which has a local array of length np
start one MPI_Igather for each element i, gathering those elements on process i.
the i-loop is parallelized using OpenMP
then call MPI_Waitall() to finish all communication. This is where the program hangs when setting OMP_NUM_THREADS to a value larger than 1.

I made two variants of this program: igather_threaded.cpp (the code below), which behaves as described above, and igather_threaded_v2.cpp, which gathers everything on MPI rank 0. This version does not deadlock, but the data is not ordered correctly either.

igather_threaded.cpp:

#include 
#include 
#include 

#define PRINT_ARRAY(STR,PTR) \
    for (int r=0; r M1(new double[nproc]);
    std::unique_ptr M2(new double[nproc]);

    for (int j=0; j requests(new MPI_Request[nproc]);

    #pragma omp parallel for schedule(static) shared(requests)
    for (int j=0; j "<


My question is: is my code theoretically correct and this is a bug in OpenMPI (4.0.3 used here), or did I miss anything that disallows starting MPI_Igather calls by
multiple threads? I also tried it with Intel MPI with similar result.
To compile, I use OpenMPI 4.0.3 (configured to support MPI_THREAD_MULTIPLE), gcc 11.2.0, and the command line
> mpicxx -std=c++17 -fopenmp -o igather_threaded igather_threaded.cpp

To run, I use
mpirun -np 4 --bind-to none env OMP_NUM_THREADS=4 env OMP_PROC_BIND=false ./igather_threaded

You may have to increase the number of MPI ranks and/or OpenMP threads, and/or run it several times because the exact behavior is not deterministic.

devreal · Accepted Answer

The MPI standard states in §6.12 that

All processes must call collective operations (blocking and nonblocking) in the same order per communicator.

Two Igather operations with different root arguments are different collective operations. Your threads are issuing them without any synchronization on the same communicator. The resulting order may not be the same on all processes.

One solution would be to use different communicators for each thread (if all processes use the same number of threads). That way you have a single collective operation on each communicator and you can be sure that the right threads exchange data among each other. If you repeat that code multiple times, just create them up front and reuse them.

Is MPI_Igather thread-safe?

Answers (2)

Related Questions