Reputation: 431
I'm a beginner in using MPI. Here I wrote a very simple program to test if MPI can run. Here is my hello.c:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
MPI_Barrier(MPI_COMM_WORLD);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
MPI_Finalize();
}
I use to node to test, the hostfile is: node1 node2
So I have two machines with name node1 and node2. I can ssh to each other without password.
I launch the program by typing: mpirun -np 2 -f hostfile ./hello
.
The executable hello is in the same directory in both machine.
Then after I run, I get an error:
Fatal error in PMPI_Barrier: Other MPI error, error stack: PMPI_Barrier(425).........: MPI_Barrier(MPI_COMM_WORLD) failed MPIR_Barrier_impl(331)....: Failure during collective MPIR_Barrier_impl(313)....: MPIR_Barrier_intra(83)....: dequeue_and_set_error(596): Communication error with rank 0 Fatal error in PMPI_Barrier: Other MPI error, error stack: PMPI_Barrier(425).........: MPI_Barrier(MPI_COMM_WORLD) failed MPIR_Barrier_impl(331)....: Failure during collective MPIR_Barrier_impl(313)....: MPIR_Barrier_intra(83)....: dequeue_and_set_error(596): Communication error with rank 1
If I comment out the MPI_Barrier(), it can work properly. It seems the communication between machines has problem? Or I didn't install openmpi correctly? Any ideas?
I'm using Ubuntu 12.10
I got some hints: This doesn't work well in MPICH2, if I use openmpi, then it works. I installed MPICH just by sudo apt-get install mpich2. Do I miss something? The size of mpich2 is much smaller than openmpi
Upvotes: 1
Views: 2487
Reputation: 373
In /etc/hosts, newer versions of some Linux distros add the following types of lines at the top of the file:
127.0.0.1 localhost
127.0.0.1 [hostname]
This should be changed so that the hostname line contains your actual IP address. The MPI hydra process will abort if you do not make this change with errors like:
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(425)...........: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(292)......:
MPIR_Barrier_or_coll_fn(121):
MPIR_Barrier_intra(83)......:
dequeue_and_set_error(596)..: Communication error with rank 0
Upvotes: 0