sperber
sperber

Reputation: 661

MPI collective operations and process lifetime (C/C++)

For the problem I'd like to discuss, let's take MPI_Barrier as an example. The MPI3 standard states

If comm is an intracommunicator, MPI_BARRIER blocks the caller until all group members have called it. The call returns at any process only after all group members have entered the call.

So I was wondering - same essentially applies to all collective operations in general - how this assertion has to be interpreted in cases where some processes of the communication context just exited (successfully) prior to execution of MPI_Barrier: For example, let's assume we have two processes A and B and use MPI_COMM_WORLD as communicator and argument comm to MPI_Barrier. After A and B call MPI_Init, if B immediately calls MPI_Finalize and exits, and if only A calls MPI_Barrier before calling MPI_Finalize, is A blocked for eternity? Or is the set of "all group members" defined as the set of all original group members which have not exited, yet? I'm pretty sure A is blocked forever, but maybe the MPI standard has more to say about this?

REMARK: This is not a question about the synchronizing properties of MPI_Barrier, the reference to MPI_Barrier is merely meant to be a concrete example. It is a question about MPI program correctness if collective operations are performed. See the comments.

Upvotes: 2

Views: 130

Answers (1)

Zulan
Zulan

Reputation: 22670

If B exits right at program start and only A calls MPI_Barrier, is A blocked for eternity?

Basically yes. But actually, you are not allowed to do that.

Simply speaking, you must call MPI_Finalize on all processes before exiting. And MPI_Finalize acts like a collective (on MPI_COMM_WORLD), so it usually does not complete before every process calls MPI_Finalize. So in your example, process B didn't exit (at least not correctly).

But I guess the MPI 3.1 standard at 8.7 explains it more clearly:

MPI_Finalize [...] This routine cleans up all MPI state. If an MPI program terminates normally (i.e., not due to a call to MPI_ABORT or an unrecoverable error) then each process must call MPI_FINALIZE before it exits. Before an MPI process invokes MPI_FINALIZE, the process must perform all MPI calls needed to complete its involvement in MPI communications: It must locally complete all MPI operations that it initiated and must execute matching calls needed to complete MPI communications initiated by other processes.

Note how the last sentence also requires you to complete the barrier in your question.

The standard says, your program is not correct. In practice it will most likely deadlock/hang.

Upvotes: 4

Related Questions