Mircea
Mircea

Reputation: 1999

C++ MPI: could not sent anything

I've try to do a sum of matrix using MPI to doing this, I don't know why but I cant sent any kind of data using MPI_Send but what ever I'm trying to do I get an error message:

Sending 3 rows to task 1 offset=0
Sending 3 rows to task 2 offset=3
Sending 2 rows to task 3 offset=6
Sending 2 rows to task 4 offset=8
*** An error occurred in MPI_Send
*** reported by process [1047527425,0]
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_RANK: invalid rank
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)

Here is my code:

# include <mpi.h>
# include <stdio.h>
# include <stdlib.h>
# include <time.h>
# include <vector>

#define ROWS 10
#define COLONS 10
#define MASTER 0

using namespace std;

int main(int argc, char *argv[]) {

    int rows;

    int averow=0;
    int extra=0;
    int offset;
    int numprocs;
    MPI_Status status;
    int matrixA[ROWS][COLONS];
    int matrixB[ROWS][COLONS];
    int matrixC[ROWS][COLONS];

    for (int i = 0; i < COLONS; i++) {
        for (int j = 0; j < ROWS; j++) {
            matrixA[i][j] = rand() % 10;
            matrixB[i][j] = rand() % 10;
        }
    }
    int my_id;

    MPI_Init(&argc, &argv);
    MPI_Comm_size( MPI_COMM_WORLD, &numprocs);
    MPI_Comm_rank( MPI_COMM_WORLD, &my_id);
    if (my_id == MASTER) {

        averow = ROWS / numprocs;
        extra = ROWS % numprocs;
        offset = 0;

        /* Send matrix data to the worker tasks */
        for (int dest = 1; dest <= numprocs; dest++) {
            rows = (dest <= extra) ? averow + 1 : averow;
            printf("Sending %d rows to task %d offset=%d\n", rows, dest, offset);
            MPI_Send(&offset, 1, MPI_INT, dest, 1, MPI_COMM_WORLD);
            MPI_Send(&rows, 1, MPI_INT, dest, 1, MPI_COMM_WORLD);
            MPI_Send(&matrixA[offset][0], rows * ROWS, MPI_DOUBLE, dest, 1,
            MPI_COMM_WORLD);
            MPI_Send(&matrixB, COLONS * COLONS, MPI_INT, dest, 1,
            MPI_COMM_WORLD);
            offset = offset + rows;
        }

        /* Receive results from worker tasks */
        for (int i = 1; i <= numprocs; i++) {
            int source = i;
            MPI_Recv(&offset, 1, MPI_INT, source, 2, MPI_COMM_WORLD, &status);
            MPI_Recv(&rows, 1, MPI_INT, source, 2, MPI_COMM_WORLD, &status);
            MPI_Recv(&matrixC[offset][0], rows * COLONS, MPI_INT, source, 2,
            MPI_COMM_WORLD, &status);
            printf("Received results from task %d\n", source);
        }
    }

    if (my_id != MASTER) {
        MPI_Recv(&offset, 1, MPI_INT, MASTER, 1, MPI_COMM_WORLD, &status);
        MPI_Recv(&rows, 1, MPI_INT, MASTER, 1, MPI_COMM_WORLD, &status);
        MPI_Recv(&matrixA, rows * COLONS, MPI_DOUBLE, MASTER, 1, MPI_COMM_WORLD, &status);
        MPI_Recv(&matrixB, COLONS * COLONS, MPI_DOUBLE, MASTER, 1,
        MPI_COMM_WORLD, &status);

        for (int k = 0; k < COLONS; k++) {
            for (int i = 0; i < rows; i++) {
                matrixC[k][i] = matrixA[k][i] + matrixB[k][i];
            }
        }
        MPI_Send(&offset, 1, MPI_INT, MASTER, 2, MPI_COMM_WORLD);
        MPI_Send(&rows, 1, MPI_INT, MASTER, 2, MPI_COMM_WORLD);
        MPI_Send(&matrixC, rows * COLONS, MPI_DOUBLE, MASTER, 2,
        MPI_COMM_WORLD);
    }
    MPI_Finalize();

    return 0;
}

I'm running this program on 8 process.

Do you have any idea what I'm doing wrong here guys? Because I can't see anything.

Upvotes: 1

Views: 175

Answers (1)

Zulan
Zulan

Reputation: 22650

There are multiple things wrong in your code:

  1. the loops for dest and i must be < numprocs. Otherwise your code is trying to send to rank 8, which does not exist!
  2. At some points you are using the MPI_DOUBLE datatype, despite not having any double data. Sending an MPI_INT and receiving a MPI_DOUBLE doesn't work either.
  3. MPI_Send(&matrixA[offset][0], rows * ROWS, ..., should be rows * COLONS.
  4. MPI_Send(&matrixB, COLONS * COLONS, ..., should be ROWS * COLONS, also on the corresponding MPI_Recv.
  5. Transfering the entire matrixB while sending chunks of matrixA, also makes no sense in the light of computing the addition.
  6. The first dimension of your matrix is the row, the second one is the coumn. However, your addition loop mixes this up incorrectly.
  7. rows and offset in your Receive results from worker tasks are not setup correctly.

I'm not sure I caught every actual error, there are also a some aspects that can be significantly improved:

  1. Having a constant ROWS and a variable rows with different meaning is extremely decremental to easily understanding the code.
  2. Your communication setup is needlessly complex. You can simplify the patterns in many places, e.g. compute rows and offset locally instead of sending it around. But most importantly, you should use collective operations. That is both much easier to reason about and also performs much better.
  3. In MPI, the master rank generally participates in the computation.

Don't be discouraged. It can be difficult for beginners to grasp MPI and it is very common to build (incorrect and inefficient) patterns that can easily be done with collectives. My recommendation is:

  1. Start from scratch, discard your current attempt.
  2. Learn about MPI_Scatterv as well as MPI_Gatherv. These are the only communication functions you need in your example. Also there is no need for separate code paths around those for the master.
  3. Think about your data layout. What is the shape of matrices on each rank. How does the global matrix map to the local ones.
  4. Use variable names that describe their meaning unambiguously and concisely.
  5. Write your code in small steps and think carefully about every line, and parameter.
  6. If it works, post it on Code Review. If it doesn't work or you are stuck, post a new question or update this. In both cases feel free to post a comment here.

Upvotes: 2

Related Questions