Reputation: 339
Here's an example Python script I'm trying to run
import mpi4py.futures as mp
def some_maths(x,y):
return (x**2)/(1+y)
if __name__=='__main__':
multiargs = [(1, 5), (2, 6), (3, 7), (5, 8), (7, 9), (9, 10)]
# Parallel execution
_PoolExecutor = mp.MPIPoolExecutor
with _PoolExecutor(max_workers=len(multiargs)) as p:
out = p.starmap(some_maths, multiargs)
for r in out:
print(r)
We are upgrading from python 3.10 to 3.13. In 3.10, with mpi4py==3.1.5, this runs fine. In 3.13, regardless of whether I use mpi4py=3.1.5, or newer 4.0.1, I get MPI communication error:
MPI_INIT has failed because at least one MPI process is unreachable
from another. This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used. Your MPI job will now abort.
You may wish to try to narrow down the problem;
* Check the output of ompi_info to see which BTL/MTL plugins are
available.
* Run your application with MPI_THREAD_SINGLE.
* Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
if using MTL-based communications) to see exactly which
communication plugins were considered and/or discarded.
There are different versions of openmpi installed in the system, so I went to check when I run the script in python 3.10, where it is working, and found (via mpi4py.MPI.Get_version()) it is using OpenMPI 2.1.1. In the newer python installation, it is using OpenMPI 4.1.7.
I enabled verbose as the third recommendation says, but it wasn't helpful at least to me:
mca: base: components_register: registering framework btl components
mca: base: components_register: found loaded component self
mca: base: components_register: component self register function successful
mca: base: components_open: opening btl components
mca: base: components_open: found loaded component self
mca: base: components_open: component self open function successful
select: initializing btl component self
select: init of component self returned success
... above repeated several times ...
mca: bml: Using self btl for send to [[43610,2],0] on node base
mca: bml: Using self btl for send to [[43610,2],1] on node base
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another. This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used. Your MPI job will now abort.
You may wish to try to narrow down the problem;
* Check the output of ompi_info to see which BTL/MTL plugins are
available.
* Run your application with MPI_THREAD_SINGLE.
* Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
if using MTL-based communications) to see exactly which
communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** reported by process [2858024962,1]
*** on a NULL communicator
*** Unknown error
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
*** An error occurred in MPI_Init_thread
*** reported by process [2858024962,0]
*** on a NULL communicator
*** Unknown error
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
My initial thought is that I could just reinstall mpi4py for my 3.13 installation, and specify in the install that it should use the older openmpi. It seems this is done via:
$ env MPICC=/path/to/mpicc python -m pip install mpi4py
First, is this a reasonable approach? second, is there a simple way for me to find my mpicc path corresponding to the older version? I've found a folder on the system for this version but no mpicc in it.
Finally, any other troubleshooting steps I should try here?
Upvotes: 0
Views: 17