Reputation: 159
I have used 2 MPI_Irecv followed by 2 MPI_Send and then MPI_Waitall for both the MPI_Irecv as follows. I have written this same block of code again after a few calculations. But it seems that the MPI processes are failing in the first block of code itself.
My communication is such that a matrix is split horizontally as the no of MPI processes and the communication occurs only between the boundaries of the matrix with below matrix grid sending 'start' / first row to above matrix grid and above matrix grid sending 'end'/last row to below matrix grid.
MPI_Request request[2];
MPI_Status status[2];
double grid[size];
double grida[size];
.
.
.
<Calculation for grid2[][]>
...
MPI_Barrier(MPI_COMM_WORLD);
if (world_rank != 0){
MPI_Irecv(&grid, size, MPI_DOUBLE, world_rank-1, 0, MPI_COMM_WORLD, &request[1]);
printf("1 MPI_Irecv");
}
if (world_rank != world_size-1){
MPI_Irecv(&grida, size, MPI_DOUBLE, world_rank+1, 1, MPI_COMM_WORLD, &request[0]);
printf("2 MPI_Irecv");
}
if (world_rank != world_size-1){
MPI_Send(grid2[end], size, MPI_DOUBLE, world_rank+1, 0, MPI_COMM_WORLD);
printf("1 MPI_Send");
}
if (world_rank != 0){
MPI_Send(grid2[start], size, MPI_DOUBLE, world_rank-1, 1, MPI_COMM_WORLD);
printf("2 MPI_Send");
}
MPI_Waitall(2, request, status);
MPI_Barrier(MPI_COMM_WORLD);
.
.
.
<Again the above code but without the initialization of MPI_Request and MPI_Status>
But for this I'm getting the error:
*** Process received signal ***
Signal: Bus error: 10 (10)
Signal code: Non-existant physical address (2)
Failing at address: 0x108bc91e3
[ 0] 0 libsystem_platform.dylib 0x00007fff50b65f5a _sigtramp + 26
[ 1] 0 ??? 0x000000010c61523d 0x0 + 4502671933
[ 2] 0 libmpi.20.dylib 0x0000000108bc8e4a MPI_Waitall + 154
[ 3] 0 dist-jacobi 0x0000000104b55770 Work + 1488
[ 4] 0 dist-jacobi 0x0000000104b54f01 main + 561
[ 5] 0 libdyld.dylib 0x00007fff508e5145 start + 1
[ 6] 0 ??? 0x0000000000000003 0x0 + 3
*** End of error message ***
*** An error occurred in MPI_Waitall
*** reported by process [1969881089,3]
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_REQUEST: invalid request
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node dhcp-10 exited on signal 10 (Bus error: 10).
--------------------------------------------------------------------------
Why is Waitall throwing error, and how is printf("1 MPI_Irecv");
not being printed. Everything before this print statement is getting printed properly.
The code works with MPI_Wait() and MPI_Isend() as follows:
// insert barrier
MPI_Barrier(MPI_COMM_WORLD);
if (world_rank != 0){
MPI_Irecv(&grid, size*2, MPI_DOUBLE, world_rank-1, 0, MPI_COMM_WORLD, &request[0]);
printf("1 MPI_Irecv");
}
if (world_rank != world_size-1){
MPI_Irecv(&grida, size*2, MPI_DOUBLE, world_rank+1, 1, MPI_COMM_WORLD, &request[1]);
printf("2 MPI_Irecv");
}
if (world_rank != world_size-1){
MPI_Isend(grid2[end], size*2, MPI_DOUBLE, world_rank+1, 0, MPI_COMM_WORLD, &request[0]);
printf("1 MPI_Send");
}
if (world_rank != 0){
MPI_Isend(grid2[start], size*2, MPI_DOUBLE, world_rank-1, 1, MPI_COMM_WORLD, &request[1]);
printf("2 MPI_Send");
}
//MPI_Waitall(2, request, status);
MPI_Wait(&request[0], &status[0]);
MPI_Wait(&request[1], &status[1]);
Upvotes: 0
Views: 929
Reputation: 8395
request[0]
is used uninitialized on the last rank, and request[1]
is used uninitialized on the first rank.
A possible fix is to statically initialize the request
array (assuming it is not used anywhere else in your code).
MPI_Request request[2] = {MPI_REQUEST_NULL, MPI_REQUEST_NULL};
As a side note, you might want to consider renaming request
into requests
and status
into statuses
to make it crystal clear these are arrays and not scalars.
Upvotes: 1