Reputation: 21
#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
int myid, numprocs, number_of_completed_operation;
char message = 'a';
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
MPI_Request* requests = (MPI_Request*)malloc((numprocs - 1)*sizeof(MPI_Request));
MPI_Status* statuses = (MPI_Status*)malloc(sizeof(MPI_Status)*(numprocs - 1));
int* indices = (int *)malloc((numprocs - 1)*sizeof(int));
char* buf = (char *)malloc((numprocs - 1)*sizeof(char));
if (myid != numprocs - 1)
{//worker
printf("***this is sender %d\n", myid);
MPI_Send(&message, 1, MPI_CHAR, numprocs - 1, 110, MPI_COMM_WORLD);
printf("*.*sender %d is done\n", myid);
}
else if (myid == numprocs - 1)
{
//master
int number_of_left_messages = numprocs - 1;//有numprocs-1个信息到来
int i;
for (i = 0; i < numprocs - 1; i++)
{
MPI_Irecv(&buf+i, 1, MPI_CHAR,i, 110, MPI_COMM_WORLD, &requests[i]);
}
MPI_Waitsome(numprocs - 1, requests, &number_of_completed_operation, indices, statuses);
number_of_left_messages = number_of_left_messages - number_of_completed_operation;
printf("number of completed operation is %d\n", number_of_left_messages);
printf("left message amount is %d\n", number_of_left_messages);
int j;
for (j = 0; j <numprocs - 1; j++)
{
printf("-------------\n");
printf("index is %d\n",indices[j]);
printf("source is %d\n", statuses[j].MPI_SOURCE);
//printf("good\n");
printf("--------====\n");
}
while (number_of_left_messages > 0)
{
MPI_Waitsome(numprocs - 1, requests, &number_of_completed_operation, indices, statuses);
printf("number of completed operation is %d\n", number_of_completed_operation);
for (j = 0; j <numprocs - 1; j++)
{
printf("-------------\n");
printf("index is %d\n", indices[j]);
printf("source is %d\n", statuses[j].MPI_SOURCE);
printf("--------====\n");
}
number_of_left_messages = number_of_left_messages - number_of_completed_operation;
printf("left message amount is %d\n", number_of_left_messages);
The logic is simple,I set the final process as the master process,all the other process are worker process,the workers send a message to the master,the master use the waitsome function to receive. When I set the number of processes as 4 or larger, the system shown me the error as following:
[soit-mpi-pro-1:12197] *** An error occurred in MPI_Waitsome
[soit-mpi-pro-1:12197] *** reported by process [140533176729601,140531329925123]
[soit-mpi-pro-1:12197] *** on communicator MPI_COMM_WORLD
[soit-mpi-pro-1:12197] *** MPI_ERR_REQUEST: invalid request
[soit-mpi-pro-1:12197] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[soit-mpi-pro-1:12197] *** and potentially your MPI job)
Upvotes: 2
Views: 1180
Reputation: 74355
You are passing MPI_Irecv
the address of the pointer buf
itself plus an offset instead of its value. When the message is received, it overwrites the last byte (on little endian systems like x86/x64) of the value of one or more nearby stack variables, which, depending on the stack layout, might include requests
and statuses
. Therefore MPI_Waitsome
receives a pointer that doesn't point to beginning of the array of requests but rather somewhere before it, after it or in the middle of it, hence some of the request handles are invalid and MPI_Waitsome
complains. On a big endian system, this would overwite the highest byte of the address and will much rather result in an invalid address and a segmentation fault.
Either use buf+i
(as per Wesley Bland's answer) or use &buf[i]
. I usually find it a matter of personal taste whether one uses the first of the second form.
Upvotes: 1
Reputation: 9062
It looks like your call to MPI_Irecv
might be a problem. Remove the extra &
before the buf
(you have a double pointer instead of a pointer).
MPI_Irecv(buf+i, 1, MPI_CHAR,i, 110, MPI_COMM_WORLD, &requests[i]);
When I fix that, add closing braces and a call to MPI_Finalize()
, and remove a bunch of extra output, I don't have any issues running your program:
$ mpiexec -n 8 ./a.out
***this is sender 3
*.*sender 3 is done
***this is sender 4
*.*sender 4 is done
***this is sender 5
*.*sender 5 is done
***this is sender 6
*.*sender 6 is done
***this is sender 0
*.*sender 0 is done
***this is sender 1
*.*sender 1 is done
***this is sender 2
*.*sender 2 is done
number of completed operation is 1
left message amount is 6
number of completed operation is 1
left message amount is 5
number of completed operation is 1
left message amount is 4
number of completed operation is 1
left message amount is 3
number of completed operation is 1
left message amount is 2
number of completed operation is 1
left message amount is 1
number of completed operation is 1
left message amount is 0
I have no idea if it gets the right answer or not, but that's a different question.
Upvotes: 2