Reputation: 21
I have a very simple MPI program where node 0 sends a character to node 1, but the send and receive are getting stuck whenever I use two or more different machines. The program works fine when I use several processes in only one machine. It seems to be a communication problem, but I can't figure it out what it is.....
Here's the code:
int main(int argc, char *argv[]) {
int numtasks, rank, tag = 1;
char inmsg, outmsg = 'x';
MPI_Status stat;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if ( rank == 0 ) {
MPI_Send(&outmsg, 1, MPI_CHAR, 1, tag, MPI_COMM_WORLD);
}
else if ( rank == 1 ) {
MPI_Recv(&inmsg, 1, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &stat);
}
MPI_Finalize();
}
Also, here's some important notes:
Any help would be appreciated. Thanks!
Upvotes: 0
Views: 546
Reputation: 21
I found a solution to my problem:
I was using MPICH and running my program with mpirun. The problem, from what it seems, is that mpich was using the wrong network interface. Each node has two interfaces: lo and ens4. From what I saw in other posts, it seems that lo is used for transferring data from one node to itself, while ens4 is used to communicate with other nodes. I verified this using the following ping commands:
$ ifconfig -a
: shows the available interfaces;$ ping -I lo mpi-test-130b
-> FAILS$ ping -I ens4 mpi-test-130b
-> SUCCESS$ ping -I lo mpi-test-uaiw
-> SUCCESS$ ping -I ens4 mpi-test-uaiw
-> FAILSOne of the possible solutions is to use the mpirun --mca btl_tcp_if_include ens4 to make sure mpirun uses the ens4 interface to communicate with the other node. But this didn't work for me, since MPICH doesn't recognize the --mca parameter. Therefore, I did the following:
$ sudo apt-get remove libcr-dev mpich mpich-doc
;$ sudo apt install openmpi-bin openmpi-doc libopenmpi-dev
;By installing the OpenMPI, my code worked. Hope it helps anyone who faces this same problem!
Upvotes: 0