Austin_Power
Austin_Power

Reputation: 21

An error occurred in MPI_Waitsome

#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    int myid, numprocs, number_of_completed_operation;

    char message = 'a';


    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &myid);

    MPI_Request* requests = (MPI_Request*)malloc((numprocs - 1)*sizeof(MPI_Request));
    MPI_Status* statuses = (MPI_Status*)malloc(sizeof(MPI_Status)*(numprocs - 1));
    int* indices = (int *)malloc((numprocs - 1)*sizeof(int));
    char* buf = (char *)malloc((numprocs - 1)*sizeof(char));

    if (myid != numprocs - 1)
    {//worker

        printf("***this is sender %d\n", myid);
        MPI_Send(&message, 1, MPI_CHAR, numprocs - 1, 110, MPI_COMM_WORLD);
        printf("*.*sender %d is done\n", myid);



    }
    else if (myid == numprocs - 1)
    {
        //master
        int number_of_left_messages = numprocs - 1;//有numprocs-1个信息到来
        int i;
        for (i = 0; i < numprocs - 1; i++)
        {
            MPI_Irecv(&buf+i, 1, MPI_CHAR,i, 110, MPI_COMM_WORLD, &requests[i]);
        }

        MPI_Waitsome(numprocs - 1, requests, &number_of_completed_operation, indices, statuses);


        number_of_left_messages = number_of_left_messages - number_of_completed_operation;
        printf("number of completed operation is %d\n", number_of_left_messages);
        printf("left message amount is %d\n", number_of_left_messages);

        int j;
        for (j = 0; j <numprocs - 1; j++)
        {
            printf("-------------\n");
            printf("index is %d\n",indices[j]);
            printf("source is %d\n", statuses[j].MPI_SOURCE);
            //printf("good\n");
            printf("--------====\n");

        }

        while (number_of_left_messages > 0)
        {
            MPI_Waitsome(numprocs - 1, requests, &number_of_completed_operation, indices, statuses);

            printf("number of completed operation is %d\n", number_of_completed_operation);
            for (j = 0; j <numprocs - 1; j++)
            {
                printf("-------------\n");
                printf("index is %d\n", indices[j]);
                printf("source is %d\n", statuses[j].MPI_SOURCE);
                printf("--------====\n");
            }
            number_of_left_messages = number_of_left_messages - number_of_completed_operation;
            printf("left message amount is %d\n", number_of_left_messages);

The logic is simple,I set the final process as the master process,all the other process are worker process,the workers send a message to the master,the master use the waitsome function to receive. When I set the number of processes as 4 or larger, the system shown me the error as following:

[soit-mpi-pro-1:12197] *** An error occurred in MPI_Waitsome
[soit-mpi-pro-1:12197] *** reported by process [140533176729601,140531329925123]
[soit-mpi-pro-1:12197] *** on communicator MPI_COMM_WORLD
[soit-mpi-pro-1:12197] *** MPI_ERR_REQUEST: invalid request
[soit-mpi-pro-1:12197] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[soit-mpi-pro-1:12197] ***    and potentially your MPI job)

Upvotes: 2

Views: 1180

Answers (2)

Hristo Iliev
Hristo Iliev

Reputation: 74355

You are passing MPI_Irecv the address of the pointer buf itself plus an offset instead of its value. When the message is received, it overwrites the last byte (on little endian systems like x86/x64) of the value of one or more nearby stack variables, which, depending on the stack layout, might include requests and statuses. Therefore MPI_Waitsome receives a pointer that doesn't point to beginning of the array of requests but rather somewhere before it, after it or in the middle of it, hence some of the request handles are invalid and MPI_Waitsome complains. On a big endian system, this would overwite the highest byte of the address and will much rather result in an invalid address and a segmentation fault.

Either use buf+i (as per Wesley Bland's answer) or use &buf[i]. I usually find it a matter of personal taste whether one uses the first of the second form.

Upvotes: 1

Wesley Bland
Wesley Bland

Reputation: 9062

It looks like your call to MPI_Irecv might be a problem. Remove the extra & before the buf (you have a double pointer instead of a pointer).

MPI_Irecv(buf+i, 1, MPI_CHAR,i, 110, MPI_COMM_WORLD, &requests[i]);

When I fix that, add closing braces and a call to MPI_Finalize(), and remove a bunch of extra output, I don't have any issues running your program:

$ mpiexec -n 8 ./a.out
***this is sender 3
*.*sender 3 is done
***this is sender 4
*.*sender 4 is done
***this is sender 5
*.*sender 5 is done
***this is sender 6
*.*sender 6 is done
***this is sender 0
*.*sender 0 is done
***this is sender 1
*.*sender 1 is done
***this is sender 2
*.*sender 2 is done
number of completed operation is 1
left message amount is 6
number of completed operation is 1
left message amount is 5
number of completed operation is 1
left message amount is 4
number of completed operation is 1
left message amount is 3
number of completed operation is 1
left message amount is 2
number of completed operation is 1
left message amount is 1
number of completed operation is 1
left message amount is 0

I have no idea if it gets the right answer or not, but that's a different question.

Upvotes: 2

Related Questions