Reputation: 777
So more recently, I have been developing some asynchronous algorithms in my research. I was doing some parallel performance studies and I have been suspicious that I am not properly understanding some details about the various non-blocking MPI functions.
I've seen some insightful posts on here, namely:
There's a few things I am uncertain about or just want to clarify related to working with non-blocking functionality that I think will help me potentially increase the performance of my current software.
From the Nonblocking Communication part of the MPI 3.0 standard:
A nonblocking send start call initiates the send operation, but does not complete it. The send start call can return before the message was copied out of the send buffer. A separate send complete call is needed to complete the communication, i.e., to verify that the data has been copied out of the send buffer. With suitable hardware, the transfer of data out of the sender memory may proceed concurrently with computations done at the sender after the send was initiated and before it completed.
...
If the send mode is standard then the send-complete call may return before a matching receive is posted, if the message is buffered. On the other hand, the receive-complete may not complete until a matching receive is posted, and the message was copied into the receive buffer.
So as a first set of questions about the MPI_Isend
(and similarly MPI_Irecv
), it seems as though to ensure a non-blocking send finishes, I need to use some mechanism to check that it is complete because in the worst case, there may not be suitable hardware to transfer the data concurrently, right? So if I never use something like MPI_Test
or MPI_Wait
following the non-blocking send, the MPI_Isend
may never actually get its message out, right?
This question applies to some of my work because I am sending messages via MPI_Isend
and not actually testing for completeness until I get the expected response message because I want to avoid the overhead of MPI_Test
calls. While this approach has been working, it seems faulty based on my reading.
Further, the second paragraph appears to say that for the standard non-blocking send, MPI_Isend
, it may not even begin to send any of its data until the destination process has called a matching receive. Given the availability of MPI_Probe
/MPI_Iprobe
, does this mean an MPI_Isend
call will at least send out some preliminary metadata of the message, such as size, source, and tag, so that the probe functions on the destination process can know a message wants to be sent there and so the destination process can actually post a corresponding receive?
Related is a question about the probe. In the Probe and Cancel section, the standard says that
MPI_IPROBE(source, tag, comm, flag, status)
returnsflag = true
if there is a message that can be received and that matches the pattern specifed by the argumentssource
,tag
, andcomm
. The call matches the same message that would have been received by a call toMPI_RECV(..., source, tag, comm, status)
executed at the same point in the program, and returns in status the same value that would have been returned byMPI_RECV()
. Otherwise, the call returnsflag = false
, and leavesstatus
undefined.
Going off of the above passage, it is clear the probing will tell you whether there's an available message you can receive corresponding to the specified source
, tag
, and comm
. My question is, should you assume that the data for the corresponding send from a successful probing has not actually been transferred yet?
It seems reasonable to me now, after reading the standard, that indeed a message the probe is aware of need not be a message that the local process has actually fully received. Given the previous details about the standard non-blocking send, it seems you would need to post a receive after doing the probing to ensure the source non-blocking standard send will complete, because there might be times where the source is sending a large message that MPI does not want to copy into some internal buffer, right? And either way, it seems that posting the receive after a probing is how you ensure that you actually get the full data from the corresponding send to be sent. Is this correct?
This latter question relates to one instance in my code where I am doing a MPI_Iprobe
call and if it succeeds, I perform an MPI_Recv
call to get the message. However, I think this could be problematic now because I was thinking in my mind that if the probe succeeds, that means it has gotten the whole message already. This implied to me that the MPI_Recv
would run quickly, then, since the full message would already be in local memory somewhere. However, I am feeling this was an incorrect assumption now that some clarification on would be helpful.
Upvotes: 2
Views: 794
Reputation: 8395
The MPI standard does not mandate a progress thread. That means that MPI_Isend()
might do nothing at all until communications are progressed. Progress occurs under the hood by most MPI subroutines, MPI_Test()
, MPI_Wait()
and MPI_Probe()
are the most obvious ones.
I am afraid you are mixing progress and synchronous send (e.g. MPI_Ssend()
).
MPI_Probe()
is a local operation, it means it will not contact the sender and ask if something was sent nor progress it.
Performance wise, you should as much as possible avoid unexpected messages, it means a receive should be posted on one end before the message is sent by the other end.
There is a trade-off between performance and portability here :
Keep in mind most MPI implementations (read this is not mandated by the MPI standard, and you should not rely on it) send small messages in eager mode.
It means MPI_Send()
will likely return immediately if the message is small enough (and small enough depends among other things on your MPI implementation, how it is tuned or which interconnect is used).
Upvotes: 4