MPI_Irecv does not properly receive the data sent by MPI_Send

Question

I have a 1D matrix data as Q_send_matrix. In each iteration, each processor updates its Q_send_matrix and sends it to the previous processor(rank-1), whereas it receives a newly updated matrix as Q_recv_matrix from the next processor(rank+1). For instance, in an iteration, Proc[0] updates its Q_send_matrix and sends it to Proc[3], whereas it receives Q_recv_matrix from Proc[1]. As you may estimated, it is like a ring communication. Please see the code below after my explanation below of the code.

        MPI_Request request;
        MPI_Status status;

        // All the elements of Q_send and Q_recv buffers 
        // are set to 1.0 initially. Each processor
        // updates its Q_send buffer to prepare it
        // to be sent below.(above part is big, so it
        // is not added here...)

        /**
         * Transfer Q matrix blocks among processors
         *      + Each processor sends the Q matrix
         *  + to the previous processor while receives
         *  + the Q matrix from the next processor
         *      + It is like a ring communication
         * */


        /* Receive Q matrix with MPI_Irecv */
        source = (my_rank+1)%comm_size;
        recv_count = no_col_per_proc[source]*input_k;

        MPI_Irecv(Q_recv_matrix, recv_count,
                MPI_FP_TYPE, source,
                0, MPI_COMM_WORLD,
                &request);


        /* Send Q matrix */
        dest = (my_rank-1+comm_size)%comm_size;
        send_count = no_col_per_proc[my_rank]*input_k;

        MPI_Send(Q_send_matrix, send_count,
                MPI_FP_TYPE, dest,
                0, MPI_COMM_WORLD);


        /* Wait status */
        // MPI_Wait(request, status);

        /* Barrier */
        MPI_Barrier(MPI_COMM_WORLD);

        /* Print Q send and receive matrices */
        for( j = 0; j < send_count; j ++ )
        {
            printf("P[%d] sends Q_send[%d] to P[%d] = %.2f
",
                    my_rank, j, dest, Q_send_matrix[j]);
        }

        for( j = 0; j < recv_count; j ++ )
        {
            printf("P[%d] receives Q_recv[%d] from P[%d] = %.2f
",
                    my_rank, j, source, Q_recv_matrix[j]);
        }

I want to do this communication as a syncronous. However, it is not possible with MPI_Send and MPI_Recv due to the deadlock based on their blocking feature. Hence, I used MPI_Irecv and MPI_Send together with an MPI_Wait. However, it did not finish, all the processors were waiting. So, I put an MPI_Barrier instead of MPI_Wait to make them syncronous, and solved the wait issue of the processors, so they finished their work. However, it did not work properly. Some outputs of the code as following are wrong. Each processor sends the correct data, and there is no problem on the sending side. On the other hand, there is no change on the received data buffer. It means that in some processors the initial value of the received buffer remains even if received a data from one of the other processors as following.

P[0] sends Q_send[0] to P[3] = -2.12
P[0] sends Q_send[1] to P[3] = -2.12
P[0] sends Q_send[2] to P[3] = 4.12
P[0] sends Q_send[3] to P[3] = 4.12
P[0] receives Q_recv[0] from P[1] = 1.00
P[0] receives Q_recv[1] from P[1] = 1.00
P[0] receives Q_recv[2] from P[1] = 1.00
P[0] receives Q_recv[3] from P[1] = 1.00

P[1] sends Q_send[0] to P[0] = -2.12
P[1] sends Q_send[1] to P[0] = -2.12
P[1] sends Q_send[2] to P[0] = 0.38
P[1] sends Q_send[3] to P[0] = 0.38
P[1] receives Q_recv[0] from P[2] = 1.00
P[1] receives Q_recv[1] from P[2] = 1.00
P[1] receives Q_recv[2] from P[2] = 1.00
P[1] receives Q_recv[3] from P[2] = 1.00

P[2] sends Q_send[0] to P[1] = 1.00
P[2] sends Q_send[1] to P[1] = 1.00
P[2] sends Q_send[2] to P[1] = -24.03
P[2] sends Q_send[3] to P[1] = -24.03
P[2] receives Q_recv[0] from P[3] = 1.00
P[2] receives Q_recv[1] from P[3] = 1.00
P[2] receives Q_recv[2] from P[3] = 1.00
P[2] receives Q_recv[3] from P[3] = 1.00

P[3] sends Q_send[0] to P[2] = 7.95
P[3] sends Q_send[1] to P[2] = 7.95
P[3] sends Q_send[2] to P[2] = 0.38
P[3] sends Q_send[3] to P[2] = 0.38
P[3] receives Q_recv[0] from P[0] = -2.12
P[3] receives Q_recv[1] from P[0] = -2.12
P[3] receives Q_recv[2] from P[0] = 4.12
P[3] receives Q_recv[3] from P[0] = 4.12

Zulan · Accepted Answer

You must finish an MPI_Wait or a successful MPI_Test before accessing the data from MPI_Irecv. You cannot replace that with a barrier.

For a ring-communication, consider using MPI_Sendrecv. It can be simpler than using asynchronous communication.

MPI_Irecv does not properly receive the data sent by MPI_Send

Answers (1)

Related Questions