George Lutsenko
George Lutsenko

Reputation: 61

Using of MPI Barrier lead to fatal error

I get a strange behavior of my simple MPI program. I spent time to find an answer myself, but I can't. I red some questions here, like OpenMPI MPI_Barrier problems, MPI_SEND stops working after MPI_BARRIER, Using MPI_Bcast for MPI communication. I red MPI tutorial on mpitutorial. My program just modify array that was broadcasted from root process and then gather modified arrays to one array and print them. So, the problem is, that when I use code listed below with uncommented MPI_Barrier(MPI_COMM_WORLD) I get an error.

#include "mpi/mpi.h"
#define N 4

void transform_row(int* row, const int k) {
  for (int i = 0; i < N; ++i) {
    row[i] *= k;
  }
}
const int root = 0;


int main(int argc, char** argv) {
  MPI_Init(&argc, &argv);
  int rank, ranksize;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &ranksize);
  if (rank == root) {
    int* arr = new int[N];
    for (int i = 0; i < N; ++i) {
      arr[i] = i * i + 1;
    }
    MPI_Bcast(arr, N, MPI_INT, root, MPI_COMM_WORLD);
  }
  int* arr = new int[N];
  MPI_Bcast(arr, N, MPI_INT, root, MPI_COMM_WORLD);
  //MPI_Barrier(MPI_COMM_WORLD);
  transform_row(arr, rank * 100);
  int* transformed = new int[N * ranksize];
  MPI_Gather(arr, N, MPI_INT, transformed, N, MPI_INT, root, MPI_COMM_WORLD);
  if (rank == root) {
    for (int i = 0; i < ranksize; ++i) {
      for (int j = 0; j < N ; j++) {
        printf("%i ", transformed[i * N + j]);
      }
      printf("\n");
    }
  }
  MPI_Finalize();
  return 0;
}

The error comes with number of thread > 1. The error:

Fatal error in PMPI_Barrier: Message truncated, error stack: PMPI_Barrier(425)...................: MPI_Barrier(MPI_COMM_WORLD) failed

MPIR_Barrier_impl(332)..............: Failure during collective

MPIR_Barrier_impl(327)..............:

MPIR_Barrier(292)...................:

MPIR_Barrier_intra(150).............:

barrier_smp_intra(111)..............:

MPIR_Bcast_impl(1452)...............:

MPIR_Bcast(1476)....................:

MPIR_Bcast_intra(1287)..............:

MPIR_Bcast_binomial(239)............:

MPIC_Recv(353)......................:

MPIDI_CH3U_Request_unpack_uebuf(568): Message truncated; 16 bytes received but buffer size is 1

I understand that some problem with buffer exists, but when I use MPI_buffer_attach to attach big buffer to MPI it don't help.

Seems I need to increase this buffer, but I don't now how to do this.

XXXXXX@XXXXXXXXX:~/test_mpi$ mpirun --version

HYDRA build details:

Version:                                 3.2

Release Date:                            Wed Nov 11 22:06:48 CST 2015

So help me please.

Upvotes: 1

Views: 2077

Answers (1)

Gilles Gouaillardet
Gilles Gouaillardet

Reputation: 8395

One issue is MPI_Bcast() is invoked twice by the root rank, but only once by the other ranks. And then root rank uses an uninitialized arr.

MPI_Barrier() might only hide the problem, but it cannot fix it.

Also, note that if N is "large enough", then the second MPI_Bcast() invoked by root rank will likely hang.

Here is how you can revamp the init/broadcast phase to fix these issues.

int* arr = new int[N];
if (rank == root) {
    for (int i = 0; i < N; ++i) {
        arr[i] = i * i + 1;
    }
MPI_Bcast(arr, N, MPI_INT, root, MPI_COMM_WORLD);

Note in this case, you can simply initialize arr on all the ranks so you do not even need to broadcast the array.

As a side note, MPI program typically

#include <mpi.h>

and then use mpicc for the compilation/linking (this is a wrapper that invoke the real compiler after setting the include/library paths and using the MPI libs)

Upvotes: 4

Related Questions