Yan  Li
Yan Li

Reputation: 431

MPI_Barrier doesn't work properly in Ubuntu

I'm a beginner in using MPI. Here I wrote a very simple program to test if MPI can run. Here is my hello.c:

#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[]) {
  int numprocs, rank, namelen;
  char processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name(processor_name, &namelen);
  MPI_Barrier(MPI_COMM_WORLD);
  printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);

  MPI_Finalize();
}

I use to node to test, the hostfile is: node1 node2

So I have two machines with name node1 and node2. I can ssh to each other without password.

I launch the program by typing: mpirun -np 2 -f hostfile ./hello.

The executable hello is in the same directory in both machine.

Then after I run, I get an error:

Fatal error in PMPI_Barrier: Other MPI error, error stack: PMPI_Barrier(425).........: MPI_Barrier(MPI_COMM_WORLD) failed MPIR_Barrier_impl(331)....: Failure during collective MPIR_Barrier_impl(313)....: MPIR_Barrier_intra(83)....: dequeue_and_set_error(596): Communication error with rank 0 Fatal error in PMPI_Barrier: Other MPI error, error stack: PMPI_Barrier(425).........: MPI_Barrier(MPI_COMM_WORLD) failed MPIR_Barrier_impl(331)....: Failure during collective MPIR_Barrier_impl(313)....: MPIR_Barrier_intra(83)....: dequeue_and_set_error(596): Communication error with rank 1

If I comment out the MPI_Barrier(), it can work properly. It seems the communication between machines has problem? Or I didn't install openmpi correctly? Any ideas?

I'm using Ubuntu 12.10

I got some hints: This doesn't work well in MPICH2, if I use openmpi, then it works. I installed MPICH just by sudo apt-get install mpich2. Do I miss something? The size of mpich2 is much smaller than openmpi

Upvotes: 1

Views: 2487

Answers (1)

Javaxtreme
Javaxtreme

Reputation: 373

In /etc/hosts, newer versions of some Linux distros add the following types of lines at the top of the file:

127.0.0.1 localhost
127.0.0.1 [hostname]

This should be changed so that the hostname line contains your actual IP address. The MPI hydra process will abort if you do not make this change with errors like:

Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(425)...........: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(292)......: 
MPIR_Barrier_or_coll_fn(121): 
MPIR_Barrier_intra(83)......: 
dequeue_and_set_error(596)..: Communication error with rank 0

Upvotes: 0

Related Questions