Reputation: 553
I'm struggling to understand how to setup a distributed MPI cluster with ipython/ipyparallel. I don't have a strong MPI background.
I've followed the following instructions in the ipyparallel docs (Using ipcluster in mpiexec/mpirun mode) and this works fine for distributing computation on a single node machine. So creating an mpi
profile, configuring it as per the instructions above, and starting the cluster
$ ipython profile create --parallel --profile=mpi
$ vim ~/.ipython/profile_mpi/ipcluster_config.py
Then on host A I start a controller and 4 MPI engines:
$ ipcontroller --ip='*' --profile=mpi
$ ipcluster engines --n=4 --profile=mpi
Running the following snippet:
from ipyparallel import Client
from mpi4py import MPI
c = Client(profile='mpi')
view = c[:]
print("Client MPI.COMM_WORLD.Get_size()=%s" % MPI.COMM_WORLD.Get_size())
print("Client engine ids %s" % c.ids)
def _get_rank():
from mpi4py import MPI
return MPI.COMM_WORLD.Get_rank()
def _get_size():
from mpi4py import MPI
return MPI.COMM_WORLD.Get_size()
print("Remote COMM_WORLD ranks %s" % view.apply_sync(_get_rank))
print("Remote COMM_WORLD size %s" % view.apply_sync(_get_size))
yields
Client MPI.COMM_WORLD.Get_size()=1
Client engine ids [0, 1, 2, 3]
Remote COMM_WORLD ranks [1, 0, 2, 3]
Remote COMM_WORLD size [4, 4, 4, 4]
Then on host B I start 4 MPI engines. I run the snippet again which yields
Client MPI.COMM_WORLD.Get_size()=1
Client engine ids [0, 1, 2, 3, 4, 5, 6, 7]
Remote COMM_WORLD ranks [1, 0, 2, 3, 2, 3, 0, 1]
Remote COMM_WORLD size [4, 4, 4, 4, 4, 4, 4, 4]
It seems that the engines from each ipcluster
command are grouped into separate communicators or size 4, hence the duplicate ranks. And there's only one MPI process for the client.
Questions:
EDIT
The answer to the first question seems to be that all MPI nodes have to be brought up at once. This is because:
MPI - Add/remove node while program is running suggests that child nodes can be added through MPI_Comm_spawn. However, according to MPI_Comm_spawn
MPI_Comm_spawn tries to start maxprocs identical copies of the MPI program specified by command, establishing communication with them and returning an intercommunicator. The spawned processes are referred to as children. The children have their own MPI_COMM_WORLD, which is separate from that of the parents.
A quick grep through the ipyparallel code suggests that this functionality isn't employed.
A partial answer to the second question is that a machinefile needs to be used so that MPI knows which remote machines it can create processes on.
The implication here is that the setup on each remote is homogenous, as provided by a cluster system like Torque/SLURM etc. Otherwise, if one is trying to use random remotes, one is going to have to do work to ensure that the environment mpiexec is executing on is homogenous.
A partial answer to the third question is no, ipyparallel can presumably work with remote MPI process, but one needs to create one ipyparall engine per MPI process.
Upvotes: 4
Views: 2623
Reputation: 38608
When you start engines with MPI in IPython parallel, it ultimately boils down to a single call of:
mpiexec [-n N] ipengine
It does no configuration of MPI. If you start multiple groups of engines on different hosts, each group will be in its own MPI universe, which is what you are seeing. The first thing to do is to make sure that everything's working as you expect with a single call to mpiexec
before you bring IPython parallel into it.
As mentioned in IPython parallel with a machine file, to use multi-host MPI, you typically need a machinefile to specify launching multiple engines on multiple hosts. For example:
# ~/mpi_hosts
machine1 slots=4
machine2 slots=4
You can use a simple test script for diagnostics:
# test_mpi.py
import os
import socket
from mpi4py import MPI
MPI = MPI.COMM_WORLD
print("{host}[{pid}]: {rank}/{size}".format(
host=socket.gethostname(),
pid=os.getpid(),
rank=MPI.rank,
size=MPI.size,
))
And run it:
$ mpiexec -machinefile ~/mpi_hosts -n 8 python test_mpi.py
machine1[32292]: 0/8
machine1[32293]: 1/8
machine1[32294]: 2/8
machine1[32295]: 3/8
machine2[32296]: 4/8
machine2[32297]: 5/8
machine2[32298]: 6/8
machine2[32299]: 7/8
Once that's working as expected, you can add
c.MPILauncher.mpi_args = ["-machinefile", "~/mpi_hosts"]
to your ~/.ipython/profile_default/ipcluster_config.py
and start your engines with
ipcluster start -n 8
Upvotes: 3