Reputation: 340
For asynchronous communication in MPI which of the following is better (in terms of performance, reliability, readability, etc.):
The communication scenario is that data has to be exchanged asynchronously and the arrival times do not matter and both processes have workload. Only the overall performance (especially no blocking) is considered.
Below is a minimal working example (I did not include workload and the timings are thus probably not meaningful).
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char const *argv[]) {
MPI_Init(NULL, NULL);
int world_size, world_rank;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
if (world_rank == 0 && world_size != 2) {
fprintf(stderr, "This example requires two MPI processes.\n");
MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
}
/* Non Blocking Send */
int buf[100] = {0};
MPI_Barrier(MPI_COMM_WORLD);
double time = MPI_Wtime();
if (world_rank == 1) {
MPI_Request request;
MPI_Isend(buf, 100, MPI_INT, 0, 0, MPI_COMM_WORLD, &request);
MPI_Wait(&request, MPI_STATUS_IGNORE);
} else {
MPI_Recv(buf, 100, MPI_INT, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
time = MPI_Wtime() - time;
printf("rank = %d, time = %f sec\n", world_rank, time);
MPI_Barrier(MPI_COMM_WORLD);
usleep(100);
if (world_rank == 0) {
printf("---\n");
}
/* Non Blocking Receive */
MPI_Barrier(MPI_COMM_WORLD);
time = MPI_Wtime();
if (world_rank == 1) {
MPI_Send(buf, 100, MPI_INT, 0, 0, MPI_COMM_WORLD);
} else {
MPI_Request request;
MPI_Irecv(buf, 100, MPI_INT, 1, 0, MPI_COMM_WORLD, &request);
MPI_Wait(&request, MPI_STATUS_IGNORE);
}
time = MPI_Wtime() - time;
printf("rank = %d, time = %f sec\n", world_rank, time);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
On my machine this generates:
rank = 0, time = 0.000035 sec
rank = 1, time = 0.000036 sec
---
rank = 0, time = 0.000035 sec
rank = 1, time = 0.000026 sec
Thank you already for your answers and have a nice day :)
Upvotes: 3
Views: 3696
Reputation: 2279
A general rule of thumb is to:
Doing it the other around is not bad, however, you could overlap the sending and receiving better.
For this reason, one should go for the second case scenario you describe. Also, if you do not use non-blocking communication, be careful of deadlocks. For deadlocks, see MPI Send and receive questions.
Exercise for the reader:
In the following fragments assume that all buffers have been allocated with sufficient size. Moreover, rank and size denote the rank of each process and the total number of MPI processes, respectively. For each fragment note whether it deadlocks or not and explain why. Report performance issues.
snippet 1:
int ireq = 0;
for (int p=0; p<size; p++)
if (p!=rank)
MPI_Isend(sbuffers[p],buflen,MPI_INT,p,0,comm,&(reqs[ireq++]));
for (int p=0; p<size; p++)
if (p!=rank)
MPI_Recv(rbuffer,buflen,MPI_INT,p,0,comm,MPI_STATUS_IGNORE);
MPI_Waitall(size-1,reqs,MPI_STATUSES_IGNORE);
snippet 2:
int ireq = 0;
for (int p=0; p<size; p++)
if (p!=rank)
MPI_Irecv(rbuffers[p],buflen,MPI_INT,p,0,comm,&(reqs[ireq++]));
MPI_Waitall(size-1,reqs,MPI_STATUSES_IGNORE);
for (int p=0; p<size; p++)
if (p!=rank)
MPI_Send(sbuffer,buflen,MPI_INT,p,0,comm);
Solution:
The snippet 1 suffers from performance issue, invert recv and send.
The snippet 2 has a deadlock due to Waitall.
Upvotes: 1