Marcel
Marcel

Reputation: 335

ISend & Recv in MPI: Different value received?

In my Matrix Addition code, I am transmitting the lower bound to the other processes with ISend and Tag 1, but when I compile the code all other slave processes claim to have the same lower bound. I don't understand why?

The Output:

I am process 1 and I received 1120 as lower bound
I am process 1 and my lower bound is 1120 and my upper bound is 1682
I am process 2 and I received 1120 as lower bound
I am process 2 and my lower bound is 1120 and my upper bound is 1682
Process 0 here: I am sending lower bound 0 to process 1
Process 0 here: I am sending lower bound 560 to process 2
Process 0 here: I am sending lower bound 1120 to process 3
Timings : 13.300698 Sec
I am process 3 and I received 1120 as lower bound
I am process 3 and my lower bound is 1120 and my upper bound is 1682

The code:

#define N_ROWS 1682
#define N_COLS 823
#define MASTER_TO_SLAVE_TAG 1 //tag for messages sent from master to slaves
#define SLAVE_TO_MASTER_TAG 4 //tag for messages sent from slaves to master

void readMatrix();
int rank, nproc, proc;
double matrix_A[N_ROWS][N_COLS];
double matrix_B[N_ROWS][N_COLS];
double matrix_C[N_ROWS][N_COLS];
int low_bound; //low bound of the number of rows of [A] allocated to a slave
int upper_bound; //upper bound of the number of rows of [A] allocated to a slave
int portion; //portion of the number of rows of [A] allocated to a slave
MPI_Status status; // store status of a MPI_Recv
MPI_Request request; //capture request of a MPI_Isend

int main (int argc, char *argv[]) {

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &nproc);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    double StartTime = MPI_Wtime();

        // -------------------> Process 0 initalizes matrices and sends work portions to other processes
        if (rank==0) {
            readMatrix();
            for (proc = 1; proc < nproc; proc++) {//for each slave other than the master
                portion = (N_ROWS / (nproc - 1)); // calculate portion without master
                low_bound = (proc - 1) * portion;
                if (((proc + 1) == nproc) && ((N_ROWS % (nproc - 1)) != 0)) {//if rows of [A] cannot be equally divided among slaves
                    upper_bound = N_ROWS; //last slave gets all the remaining rows
                } else {
                    upper_bound = low_bound + portion; //rows of [A] are equally divisable among slaves
                }
                //send the low bound first without blocking, to the intended slave
                printf("Process 0 here: I am sending lower bound %i to process %i \n",low_bound,proc);
                MPI_Isend(&low_bound, 1, MPI_INT, proc, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD, &request);
                //next send the upper bound without blocking, to the intended slave
                MPI_Isend(&upper_bound, 1, MPI_INT, proc, MASTER_TO_SLAVE_TAG + 1, MPI_COMM_WORLD, &request);
                //finally send the allocated row portion of [A] without blocking, to the intended slave
                MPI_Isend(&matrix_A[low_bound][0], (upper_bound - low_bound) * N_COLS, MPI_DOUBLE, proc, MASTER_TO_SLAVE_TAG + 2, MPI_COMM_WORLD, &request);
            }
        }

        //broadcast [B] to all the slaves
        MPI_Bcast(&matrix_B, N_ROWS*N_COLS, MPI_DOUBLE, 0, MPI_COMM_WORLD);


        // -------------------> Other processes do their work
        if (rank != 0) {
            //receive low bound from the master
            MPI_Recv(&low_bound, 1, MPI_INT, 0, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD, &status);
            printf("I am process %i and I received %i as lower bound \n",rank,low_bound);
            //next receive upper bound from the master
            MPI_Recv(&upper_bound, 1, MPI_INT, 0, MASTER_TO_SLAVE_TAG + 1, MPI_COMM_WORLD, &status);
            //finally receive row portion of [A] to be processed from the master
            MPI_Recv(&matrix_A[low_bound][0], (upper_bound - low_bound) * N_COLS, MPI_DOUBLE, 0, MASTER_TO_SLAVE_TAG + 2, MPI_COMM_WORLD, &status);
            printf("I am process %i and my lower bound is %i and my upper bound is %i \n",rank,low_bound,upper_bound);
            //do your work
            for (int i = low_bound; i < upper_bound; i++) {
                for (int j = 0; j < N_COLS; j++) {
                    matrix_C[i][j] = (matrix_A[i][j] + matrix_B[i][j]);
                }
            }
            //send back the low bound first without blocking, to the master
            MPI_Isend(&low_bound, 1, MPI_INT, 0, SLAVE_TO_MASTER_TAG, MPI_COMM_WORLD, &request);
            //send the upper bound next without blocking, to the master
            MPI_Isend(&upper_bound, 1, MPI_INT, 0, SLAVE_TO_MASTER_TAG + 1, MPI_COMM_WORLD, &request);
            //finally send the processed portion of data without blocking, to the master
            MPI_Isend(&matrix_C[low_bound][0], (upper_bound - low_bound) * N_COLS, MPI_DOUBLE, 0, SLAVE_TO_MASTER_TAG + 2, MPI_COMM_WORLD, &request);
        }

        // -------------------> Process 0 gathers the work
        ...

Upvotes: 2

Views: 2359

Answers (1)

francis
francis

Reputation: 9817

MPI_Isend() begins a non-blocking send. Hence, modifiying the buffer that is sent without checking that the message was actually sent result in wrong values being sent.

This is what happens in the piece of code you provided, in the loop on process for (proc = 1; proc < nproc; proc++)

  1. proc=1 : low_bound is computed.

  2. proc=1 : low_bound is sent (non-blocking) to process 1.

  3. proc=2 : low_bound is modified. The message is corrupted.

Different solutions exist:

  • Use blocking send MPI_Send().

  • Check that the message are completed by creating an array of 3 requests MPI_Request requests[3]; MPI_Status statuses[3];, use non-blocking send and use MPI_Waitall() to check the completion of the requests.

     MPI_Isend(&low_bound, 1, MPI_INT, proc, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD, &requests[0]);
     MPI_Isend(..., &requests[1]);
     MPI_Isend(..., &requests[2]);
     MPI_Waitall(3, requests, statuses);
    
  • Take a look at MPI_Scatter() and MPI_Scatterv() !

The "usual" way to do this is to MPI_Bcast() the size of the matrix. Then each process computes the size of its part of the matrix. Process 0 computes the sendcounts and displs needed by MPI_Scatterv().

Upvotes: 3

Related Questions