Scalahansolo
Scalahansolo

Reputation: 3045

Python MPI setup issues

I am working with a large chunk of serial code that I am trying to get part of to run under MPI. Currently my files (Parents with their child files) are as followed below. My main issue is that my file that imports MPI is causing N processes to spawn from the start.

File1 (Imports->) File2 (Imports->) File3(Where MPI is imported)

What I am trying to do is that most of my serial code stays the same and then to have parts of my code run under MPI. I have something similar to the following

File1.py

import file2

def main():
   print "Testing"
   function1():

def function1():
   function2():

File2.py

import file3

def function2():
    answers = function3()

File3.py

def function3():
    from mpi4py import MPI
    if rank == 1:
       ...
    elif rank == 0:
       ...
    ... # Do work with MPI and return stuff to file2.py

What is happening here is that Testing gets output to the console for every process that File1.py starts with. I.e. mpiexec -n 2 python file1.py results in 2 outputs in Testing when I only expect the print statement to run. However, running the same command without the MPI import just results in one print of Testing. So is it possible that I isolate MPI to just file 3 and not all three files.

Upvotes: 2

Views: 284

Answers (3)

NOhs
NOhs

Reputation: 2830

While MPI traditionally has always been a 100% parallel program idea, with newer MPI versions it is possible (and with mpi4py quite easily) to spawn MPI processes from within your python script.

I give an example of how to do this in an answer to basically the same question: https://stackoverflow.com/a/50022229/2305545

The example there is taken from the dynamic process management tutorial of the mpi4py documentation.

Note, that some MPI implementations like MSMPI or the Cray MPICH version do not support this type of process spawning.

Upvotes: 0

jmd_dk
jmd_dk

Reputation: 13120

MPI parallelisation is meant to be global in the sense that mpiexec initializes your entire program n times simultaneously, with each process being distinguished only by its rank. The parallelisation model you describe sounds more like a fork/join model, where the parallelisation begins and ends somewhere in the middel of the program (e.g. thread programming).

If you keep the intended mindset of MPI in mind, it should not be too hard to make a Python program which behaves as you want. Sure, several processes may run the very same "start up code" for no reason, but unless this task is demanding (i.e. takes up a lot of memory), this is not a problem.

If you really want to spawn MPI processes in the middle of your Python program, I'm sure you could hack something together using e.g. the subprocess module, from where you can start new Python instances through mpiexec. However, the final result of the parallel session cannot be communicated to the original Python program without first dumping it to disk, unless the result is small enough to be passed through the subprocess itself.

Upvotes: 0

Rob Latham
Rob Latham

Reputation: 5223

As with all MPI programs, not just python programs, mpiexec starts up N instances of your program. While you do not have "using mpi", mpiexec is still running the python interpreter.

You can try this with all kinds of programs. Try 'mpiexec -np 3 date'. Date is not an mpi program, but you'll see three instances of the date nonetheless.

To get what you want you might have to look into the dynamic process stuff, but that is not universally supported. conceptually, you would spawn N processes to do MPI work. It's a nice mental model but not common in practice.

Upvotes: 0

Related Questions