Does MPI guarantee non-overtaking property with mixed MPI_Isend and MPI_Irecv?

I have two MPI processes that exchange a buffer and an "ack" message, with a MPI_Isend and a MPI_Ircv, after that I use MPI_Waitall(...).

Process 0:
  MPI_ISend(...) // sends buffer
  MPI_IRecv(...) // gets ack
  MPI_waitwall(...)

Process 1:
  MPI_IRecv(...) // receives buffer
  MPI_ISend(...) // sends ack
  MPI_waitwall(...)

What I understand from the specification is that a sequence of ISends are guaranteed to be executed in order if they have the same tag, destination rank and communicator. But I am not sure if a given Isend can execute before this Irecev.

#include <chrono>
#include <iostream>
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <thread>
#include <vector>
​
int main() {
  int rank, size;
  MPI_Init(NULL, NULL);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  const int N = 400000000;
  const int VAL = 8;

 if (rank == 0) {
    std::vector<int> buffer(N, VAL);
    MPI_Request requests[2];
​
    std::this_thread::sleep_for(std::chrono::seconds(2));
    MPI_Isend(buffer.data(), N, MPI_INT, 1, 0, MPI_COMM_WORLD, &requests[0]);
    MPI_Irecv(nullptr, 0, MPI_BYTE, 1, 1, MPI_COMM_WORLD, &requests[1]);
​
    MPI_Waitall(2, requests, MPI_STATUSES_IGNORE);
​
    // time when we came back from the waitall
    auto now = std::chrono::high_resolution_clock::now();
    auto now_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(
                      now.time_since_epoch())
                      .count();
​
    std::cout << "[Rank 0][T=" << now_ns
              << "]: Rank 1 must have the buffer already!\n";
​
  } else if (rank == 1) {
    std::vector<int> buffer(N, 0);
    MPI_Request requests[2];
​
    MPI_Irecv(buffer.data(), N, MPI_INT, 0, 0, MPI_COMM_WORLD, &requests[0]);
    MPI_Isend(nullptr, 0, MPI_BYTE, 0, 1, MPI_COMM_WORLD, &requests[1]);
​
    MPI_Waitall(2, requests, MPI_STATUSES_IGNORE);
​
    // time when we came back from the waitall
    auto now = std::chrono::high_resolution_clock::now();
    auto now_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(
                      now.time_since_epoch())
                      .count();
​
    std::cout << "[Rank 1][T=" << now_ns << "]: Received buffer!\n";

  }
 MPI_Finalize();
  return 0;
}

I have written some MPI code (C++ bindings) to test it, and I expected the print form Rank 1 to always come BEFORE the print form Rank 0, but this is not always the case. Did I get the non-overtaking description wrong, is this test flawed, or this is expected behavior?

Upvotes: 0

Views: 86

Answers (1)

j23
j23

Reputation: 3530

Let me try to answer step by step:

What I understand from the specification is that a sequence of ISends are guaranteed to be executed in order if they have the same tag, destination rank and communicator.

It is guaranteed to be executed if all the Isends are posted by the same rank. For example,

Process 0:
  MPI_ISend(...) // sends buffer
  MPI_IRecv(...) // gets ack
  MPI_ISend(...) // sends buffer
  MPI_waitwall(...)

Here order of ISends is maintained, if they are sending to the same rank with the same tag in a single thread. This is what meant by Messages are non-overtaking in the standard.

But I am not sure if a given Isend can execute before this Irecv.

Yes, Isend can execute before Irecv and vice versa since calling them will only initiate the operation.

I have written some MPI code (C++ bindings) to test it, and I expected the print form Rank 1 to always come BEFORE the print form Rank 0, but this is not always the case.

This is not guaranteed for two reasons. 1) The order of printing is non deterministic. 2) The Order of completion of Isend and Irecv in two different ranks are nondeterministic.

Did I get the non-overtaking description wrong, is this test flawed, or this is expected behavior?

Your understanding on non overtaking is wrong from the perspective that non overtaking only applies to the messages sent from a single rank not from a global perspective. The expected behaviour of the application is non deterministic (Either rank 1 or rank 0 can print based on the order of printing or order of completion).

Upvotes: 1

Related Questions