GrundleMoof
GrundleMoof

Reputation: 299

Why does python's multiprocessing Pool get() function block sometimes but not others?

I'm using Multiprocessing.Pool in python, and I don't understand something that's happening. Here's a dummy version that shows what I mean:

from multiprocessing import Pool
from time import sleep

def f1():
    for i in range(5):
        sleep(.1)
        print("f1:",i)
    print("f1 exiting")
    return('f1f1f1f1f1f1f1')

def f2():
    for i in range(10):
        sleep(.1)
        print("f2:",i)
    print("f2 exiting")
    return('f2f2f2ff2f2f2f2f2f')

pool = Pool(processes=2)

print('starting apply_async for p1, p2')
p1 = pool.apply_async(f1)
p2 = pool.apply_async(f2)
print('finished apply_async for p1, p2')


print('starting get() for p1, p2')
print(p1.get(timeout=10))
print(p2.get(timeout=10))
print('finished get() for p1, p2')


print('\n\n\ndone')

If I run this, f1 and f2 run and output at the same time:

starting apply_async for p1, p2
finished apply_async for p1, p2
starting get() for p1, p2
f1: 0
f2: 0
f2: 1
f1: 1
f2: 2
f1: 2
f2: 3
f1: 3
f1: 4
f2: 4
f1 exiting
f1f1f1f1f1f1f1
f2: 5
f2: 6
f2: 7
f2: 8
f2: 9
f2 exiting
f2f2f2ff2f2f2f2f2f
finished get() for p1, p2

So clearly get() is not blocking the execution of the rest of the main part of the program when p1 calls it, it goes immediately to p2.get().

However, if I instead do (note that f1 is slightly changed):

from multiprocessing import Pool
from time import sleep

def f1():
    for i in range(5):
        sleep(1)
        print("f1:",i)
    print("f1 exiting")
    return('f1f1f1f1f1f1f1')

def f2():
    for i in range(10):
        sleep(.1)
        print("f2:",i)
    print("f2 exiting")
    return('f2f2f2ff2f2f2f2f2f')


pool = Pool(processes=1)

print('starting apply_async for p1')
p1 = pool.apply_async(f1)
print('finished apply_async for p1')

print('starting get() for p1')
print(p1.get(timeout=10))
print('finished get() for p1')

print('calling f2()')
f2()

print('\n\n\ndone')

I get:

starting apply_async for p1
finished apply_async for p1
starting get() for p1
f1: 0
f1: 1
f1: 2
f1: 3
f1: 4
f1 exiting
f1f1f1f1f1f1f1
finished get() for p1
calling f2()
f2: 0
f2: 1
f2: 2
f2: 3
f2: 4
f2: 5
f2: 6
f2: 7
f2: 8
f2: 9
f2 exiting

So in this case, p1.get() IS blocking for the main part of the program. It also doesn't make a difference if I use 1 or 2 processes in this case.

I get that it's because in this case, f2 is not being called with one of the Pool workers, but I'm still confused. Even weirder to me, if I switch the order of f1 and f2 in the 2nd case, like:

pool = Pool(processes=1)

print('starting apply_async for p1')
p1 = pool.apply_async(f1)
print('finished apply_async for p1')

print('calling f2()')
f2()

print('starting get() for p1')
print(p1.get(timeout=10))
print('finished get() for p1')

It DOES start the get() for f1 while f2 is still executing:

starting apply_async for p1
finished apply_async for p1
calling f2()
f2: 0
f2: 1
f2: 2
f2: 3
f2: 4
f2: 5
f2: 6
f2: 7
f2: 8
f1: 0
f2: 9
f2 exiting
starting get() for p1
f1: 1
f1: 2
f1: 3
f1: 4
f1 exiting
f1f1f1f1f1f1f1
finished get() for p1

(you can see the f1: 0 in between f2: 8 and f2:9.)

That's really confusing to me. In this case f2 had nothing to do with the Pool stuff, so how is it not blocking when it's called first?

Can someone clear up what's happening with Pool? I've read the docs but it didn't really clear it up for me.

Upvotes: 0

Views: 367

Answers (1)

scnerd
scnerd

Reputation: 6113

It is blocking in every case. The difference with your second and third examples from your first is that you're not printing out anything between p1.get and p2.get, so there's no way to see from the printout whether or not it's blocking. p2 starts running as soon as you call apply_async(f2), hence why you're getting output from p2 while p1 is still waiting, but that has no bearing on your call to p1.get.

Upvotes: 1

Related Questions