Ellie
Ellie

Reputation: 21

How to wait for all multiprocessing.Processes to complete before continuing?

I am learning about Python multiprocessing and trying to understand how I can make my code wait for all processes to finish and then continue with the rest of the code. I thought join() method should do the job, but the output of my code is not what I expected from the using it.

Here is the code:

from multiprocessing import Process
import time 

def fun():

    print('starting fun')
    time.sleep(2)
    print('finishing fun')

def fun2():

    print('starting fun2')
    time.sleep(5)
    print('finishing fun2')

def fun3():

    print('starting fun3')
    print('finishing fun3')   

if __name__ == '__main__':
    processes = []
    print('starting main')
    for i in [fun, fun2, fun3]:
        p = Process(target=i)
        p.start()
        processes.append(p)
    for p in processes:
        p.join()  
    print('finishing main')

g=0
print("g",g)

I expected all processes under if __name__ == '__main__': to finish before the lines g=0 and print(g) are called, so something like this was expected:

starting main
starting fun2
starting fun
starting fun3
finishing fun3
finishing fun
finishing fun2
finishing main
g 0

But the actual output indicates that there's something I don't understand about join() (or multiprocessing in general):

starting main
g 0
g 0
starting fun2
g 0
starting fun
starting fun3
finishing fun3
finishing fun
finishing fun2
finishing main
g 0

The question is: How do I write the code that finishes all processes first and then continues with the code without multiprocessing, so that I get the former output? I run the code from command prompt on Windows, in case it matters.

Upvotes: 2

Views: 4262

Answers (1)

Gustavo Kawamoto
Gustavo Kawamoto

Reputation: 3067

On waiting the Process to finish:

You can just Process.join your list, something like

import multiprocessing
import time

def func1():
    time.sleep(1)
    print('func1')

def func2():
    time.sleep(2)
    print('func2')

def func3():
    time.sleep(3)
    print('func3')

def main():
    processes = [
        multiprocessing.Process(target=func1),
        multiprocessing.Process(target=func2),
        multiprocessing.Process(target=func3),
    ]
    for p in processes:
        p.start()

    for p in processes:
        p.join()

if __name__ == '__main__':
    main()

But if you're thinking about giving your process more complexity, try using a Pool:

import multiprocessing
import time

def func1():
    time.sleep(1)
    print('func1')

def func2():
    time.sleep(2)
    print('func2')

def func3():
    time.sleep(3)
    print('func3')

def main():
    result = []
    with multiprocessing.Pool() as pool:
        result.append(pool.apply_async(func1))
        result.append(pool.apply_async(func2))
        result.append(pool.apply_async(func3))

        for r in result:
            r.wait()

if __name__ == '__main__':
    main()

More info on Pool

On why g0 prints multiple times:

This is happening because you're using spawn or forkserver to set your Process and the g0 and print declarations are outside a function or the __main__ if block.

From the docs:

Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).

(...)

This allows the newly spawned Python interpreter to safely import the module and then run the module’s foo() function.

Similar restrictions apply if a pool or manager is created in the main module.

It's basically interpreting again because it's importing your .py file as a module.

Upvotes: 1

Related Questions