Novice
Novice

Reputation: 633

MPI: Process 0 executing its code twice

I'm having a weird problem with an MPI program. Part of the code is supposed to be executed by the root (process zero) only, but process zero seems to execute it twice. For example,

root = 0;
if (rank == root) {
    cout << "Hello from process " << rank << endl;
}

gives

Hello from process 0

Hello from process 0

This seems to only happen when I use 16 or more processes. I've been trying to debug this for quite a few days but couldn't.

Since I don't know why this is happening, I think I have to copy my entire code here. I made it nice and clear. The goal is to multiply two matrices (with simplifying assumptions). The problem happens in the final if block.

#include <iostream>
#include <cstdlib>
#include <cmath>
#include "mpi.h"

using namespace std;

int main(int argc, char *argv[]) {
    if (argc != 2) {
        cout << "Use one argument to specify the N of the matrices." << endl;
        return -1;
    }

    int N = atoi(argv[1]);
    int A[N][N], B[N][N], res[N][N];

    int i, j, k, start, end, P, p, rank;

    int root=0;
    MPI::Status status;

    MPI::Init(argc, argv);

    rank = MPI::COMM_WORLD.Get_rank();
    P = MPI::COMM_WORLD.Get_size();
    p = sqrt(P);

    /* Designate the start and end position for each process. */
    start = rank * N/p;
    end = (rank+1) * N/p;

    if (rank == root) { // No problem here
        /* Initialize matrices. */
        for (i=0; i<N; i++)
            for (j=0; j<N; j++) {
                A[i][j] = N*i + j;
                B[i][j] = N*i + j;
            }

        cout << endl << "Matrix A: " << endl;
        for(i=0; i<N; ++i)
            for(j=0; j<N; ++j) {
                cout << "  " << A[i][j];
                if(j==N-1)
                    cout << endl;
            }

        cout << endl << "Matrix B: " << endl;
        for(i=0; i<N; ++i)
            for(j=0; j<N; ++j) {
                cout << "  " << B[i][j];
                if(j==N-1)
                    cout << endl;
            }
    }

    /* Broadcast B to all processes. */
    MPI::COMM_WORLD.Bcast(B, N*N, MPI::INT, 0);

    /* Scatter A to all processes. */
    MPI::COMM_WORLD.Scatter(A, N*N/p, MPI::INT, A[start], N*N/p, MPI::INT, 0);
    /* Compute your portion of the final result. */    
    for(i=start; i<end; i++)
        for(j=0; j<N; j++) {
            res[i][j] = 0;
            for(k=0; k<N; k++)
                res[i][j] += A[i][k]*B[k][j];
        }

    MPI::COMM_WORLD.Barrier();
    /* Gather results form all processes. */    
    MPI::COMM_WORLD.Gather(res[start], N*N/p, MPI::INT, res, N*N/p, MPI::INT, 0);


    if (rank == root) { // HERE is the problem!
        // This chunk executes twice in process 0
        cout << endl << "Result of A x B: " << endl;
        for(i=0; i<N; ++i)
            for(j=0; j<N; ++j) {
                cout << "  " << res[i][j];
                if(j == N-1)
                    cout << endl;
            }
    }

    MPI::Finalize();
    return 0;
}

When I run the program with P = 16 and two 4x4 matrices:

>$ mpirun -np 16 ./myprog 4

Matrix A: 
  0  1  2  3
  4  5  6  7
  8  9  10  11
  12  13  14  15

Matrix B: 
  0  1  2  3
  4  5  6  7
  8  9  10  11
  12  13  14  15

Result of A x B: 
  6366632  0  0  0
  -12032  32767  0  0
  0  0  -1431597088  10922
  1  10922  0  0

Result of A x B: 
  56  62  68  74
  152  174  196  218
  248  286  324  362
  344  398  452  506

Why is it printing out that first result? I would really appreciate if someone was willing to help me.

Upvotes: 0

Views: 724

Answers (1)

Zulan
Zulan

Reputation: 22670

You have undefined behavior / you are corrupting your memory. Let's assume your example with N=4,P=16,p=4. Therefore start=rank.

What do you do when you Scatter? You send 4 elements each to 16 processes. MPI will be assuming A on the root contains 64 elements, but it only contains 16. Further you store them at all ranks in A[start]. I don't even know if that is exactly defined, but it should be equal to A[start][0], which is out of the allocated memory for A when rank >= 4. So you already read and write to invalid memory. The wildly invalid memory accesses continue in the loop and Gather.

Unfortunately, MPI programs can be difficult to debug, especially with respect to memory corruption. There is very valuable info for OpenMPI. Read the entire page! mpirun -np 16 valgrind ... would have told you about the issue.

Some other notable issues:

  • The C++ bindings of MPI have been deprecated for years. You should either use the C bindings in C++ or a high level binding such as Boost.MPI.

  • Variable-length arrays are not standard C++.

  • You don't need a Barrier before the Gather.

  • Make sure your code is not full of unchecked assumptions. Do assert that P is square, if you need it to be, that N is divisible by p, if you need it to be.

  • Never name two variables P and p.

Now I am struggling as to what I should recommend you in addition to using debugging tools. If you need a fast parallel matrix multiplication - use a library. If you want to write nice high-level code as an exercise - use boost::mpi and some high-level matrix abstraction. If you want to write low-level code as an exercise - use std::vector<>(N*N), build your own 2D-index and think carefully how to index it and how to access the correct chunks of memory.

Upvotes: 2

Related Questions