Reputation: 1552
I use mpi4py to parallelize my Python application. I noticed that I run into deadlocks during MPI.Gather
whenever I increase the number of processes or the involved array sizes too much.
from mpi4py import MPI
import numpy as np
COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank()
SIZE = COMM.Get_size()
def test():
arr = RANK * np.ones((100, 400, 15), dtype='int64')
recvbuf = None
if RANK == 0:
recvbuf = np.empty((SIZE,) + arr.shape, dtype=arr.dtype)
print("%s gathering" % RANK)
COMM.Gather([arr, arr.size, MPI.LONG], recvbuf, root=0)
print("%s done" % RANK)
if RANK == 0:
for i in range(SIZE):
assert np.all(recvbuf[i] == i)
if __name__ == '__main__':
test()
Executing this gives:
$ mpirun -n 4 python bug.py
1 gathering
2 gathering
3 gathering
0 gathering
1 done
2 done
while processes 0 and 3 hang indefinitely. However, if I change the array dimensions to (10, 400, 15)
, or run the script with -n 2
, everything works as expected.
Am I missing something? Is this a bug in OpenMPI or mpi4py?
Upvotes: 1
Views: 829
Reputation: 1552
I just noticed that everything works fine with MPICH via Homebrew. So, in case anyone runs into a similar situation on OSX, a workaround is
$ brew unlink open-mpi
$ brew install mpich
$ pip uninstall mpi4py
$ pip install mpi4py --no-cache-dir
Then, I had to edit /etc/hosts
and add the line
127.0.0.1 <mycomputername>
in order for MPICH to work correctly.
Update:
By now, this issue should be fixed. The bug was reported and updating OpenMPI to 4.0.1 fixed it for me.
Upvotes: 1