TamTamHero
TamTamHero

Reputation: 33

Boost MPI blocks on split call

I setup a Beowulf cluster with 3 VMs with MPICH and Boost on each machine. My programs are working fine on my cluster but when I try to use boost::split, execution blocks indefinitely.

Take the following code:

#include <boost/mpi.hpp>
#include <iostream>

namespace  mpi = boost::mpi;

int main (int argc , char* argv[])
{
    mpi::environment env(argc,argv);
    mpi::communicator world;

    int group_id = world.rank()%3;

    mpi::communicator local = world.split(group_id);

    std::cout  << "I am process " << world.rank() << " of " << world.size() << "." << std::endl;

    std::cout  << "I am sub-process " << local.rank() << " of " << local.size() << "." << std::endl;



    return 0;
}

When executed on the cluster, nothing happens. But if I execute it only on a single node (and let say with -np 9), it works just fine :

I am process 5 of 9.
I am process 2 of 9.
I am process 3 of 9.
I am process 1 of 9.
I am process 6 of 9.
I am process 7 of 9.
I am process 0 of 9.
I am process 4 of 9.
I am sub-process 2 of 3.
I am sub-process 0 of 3.
I am sub-process 1 of 3.
I am sub-process 2 of 3.
I am sub-process 1 of 3.
I am sub-process 1 of 3.
I am sub-process 0 of 3.
I am process 8 of 9.
I am sub-process 2 of 3.
I am sub-process 0 of 3.

Removing the boost::split call makes the example to execute as intended over the 3 nodes, so the call to split is clearly guilty here.

Any idea what I'm doing wrong with boost::split ?

Upvotes: 1

Views: 76

Answers (1)

TamTamHero
TamTamHero

Reputation: 33

I finaly found the problem: mpirun was sometime trying to use the wrong interface for communications. By specifying the good interface when running mpirun, everything goes fine !

Here is the parameter to give to mpirun:

--mca btl_tcp_if_include [your_network_interface]

Upvotes: 1

Related Questions