Reputation: 61
I get a strange behavior of my simple MPI program. I spent time to find an answer myself, but I can't. I red some questions here, like OpenMPI MPI_Barrier problems, MPI_SEND stops working after MPI_BARRIER, Using MPI_Bcast for MPI communication. I red MPI tutorial on mpitutorial. My program just modify array that was broadcasted from root process and then gather modified arrays to one array and print them. So, the problem is, that when I use code listed below with uncommented MPI_Barrier(MPI_COMM_WORLD) I get an error.
#include "mpi/mpi.h"
#define N 4
void transform_row(int* row, const int k) {
for (int i = 0; i < N; ++i) {
row[i] *= k;
}
}
const int root = 0;
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int rank, ranksize;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &ranksize);
if (rank == root) {
int* arr = new int[N];
for (int i = 0; i < N; ++i) {
arr[i] = i * i + 1;
}
MPI_Bcast(arr, N, MPI_INT, root, MPI_COMM_WORLD);
}
int* arr = new int[N];
MPI_Bcast(arr, N, MPI_INT, root, MPI_COMM_WORLD);
//MPI_Barrier(MPI_COMM_WORLD);
transform_row(arr, rank * 100);
int* transformed = new int[N * ranksize];
MPI_Gather(arr, N, MPI_INT, transformed, N, MPI_INT, root, MPI_COMM_WORLD);
if (rank == root) {
for (int i = 0; i < ranksize; ++i) {
for (int j = 0; j < N ; j++) {
printf("%i ", transformed[i * N + j]);
}
printf("\n");
}
}
MPI_Finalize();
return 0;
}
The error comes with number of thread > 1. The error:
Fatal error in PMPI_Barrier: Message truncated, error stack: PMPI_Barrier(425)...................: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(332)..............: Failure during collective
MPIR_Barrier_impl(327)..............:
MPIR_Barrier(292)...................:
MPIR_Barrier_intra(150).............:
barrier_smp_intra(111)..............:
MPIR_Bcast_impl(1452)...............:
MPIR_Bcast(1476)....................:
MPIR_Bcast_intra(1287)..............:
MPIR_Bcast_binomial(239)............:
MPIC_Recv(353)......................:
MPIDI_CH3U_Request_unpack_uebuf(568): Message truncated; 16 bytes received but buffer size is 1
I understand that some problem with buffer exists, but when I use MPI_buffer_attach to attach big buffer to MPI it don't help.
Seems I need to increase this buffer, but I don't now how to do this.
XXXXXX@XXXXXXXXX:~/test_mpi$ mpirun --version
HYDRA build details:
Version: 3.2
Release Date: Wed Nov 11 22:06:48 CST 2015
So help me please.
Upvotes: 1
Views: 2077
Reputation: 8395
One issue is MPI_Bcast()
is invoked twice by the root
rank, but only once by the other ranks. And then root
rank uses an uninitialized arr
.
MPI_Barrier()
might only hide the problem, but it cannot fix it.
Also, note that if N
is "large enough", then the second MPI_Bcast()
invoked by root
rank will likely hang.
Here is how you can revamp the init/broadcast phase to fix these issues.
int* arr = new int[N];
if (rank == root) {
for (int i = 0; i < N; ++i) {
arr[i] = i * i + 1;
}
MPI_Bcast(arr, N, MPI_INT, root, MPI_COMM_WORLD);
Note in this case, you can simply initialize arr
on all the ranks so you do not even need to broadcast the array.
As a side note, MPI program typically
#include <mpi.h>
and then use mpicc
for the compilation/linking
(this is a wrapper that invoke the real compiler after setting the include/library paths and using the MPI libs)
Upvotes: 4