Reputation: 2802
I am getting familiar with Python's multiprocessing
module. The following code works as expected:
#outputs 0 1 2 3
from multiprocessing import Pool
def run_one(x):
print x
return
pool = Pool(processes=12)
for i in range(4):
pool.apply_async(run_one, (i,))
pool.close()
pool.join()
Now, however, if I wrap a function around the above code, the print
statements are not executed (or the output is redirected at least):
#outputs nothing
def run():
def run_one(x):
print x
return
pool = Pool(processes=12)
for i in range(4):
pool.apply_async(run_one, (i,))
pool.close()
pool.join()
If I move the run_one
definition outside of run
, the output is the expected one again, when I'm calling run()
:
#outputs 0 1 2 3
def run_one(x):
print x
return
def run():
pool = Pool(processes=12)
for i in range(4):
pool.apply_async(run_one, (i,))
pool.close()
pool.join()
What am I missing here? Why isn't the second snippet printing anything? If I simply call the run_one(i)
function instead of using apply_async
, all the three codes output the same.
Upvotes: 5
Views: 3759
Reputation: 21654
Pool needs to pickle (serialize) everything it sends to its worker-processes. Pickling actually only saves the name of a function and unpickling requires re-importing the function by name. For that to work, the function needs to be defined at the top-level, nested functions won't be importable by the child and already trying to pickle them raises an exception:
from multiprocessing.connection import _ForkingPickler
def run():
def foo(x):
pass
_ForkingPickler.dumps(foo) # multiprocessing custom pickler;
# same effect with pickle.dumps(foo)
run()
# Out:
Traceback (most recent call last):
...
AttributeError: Can't pickle local object 'run.<locals>.foo'
The reason why you don't see an exception is, because Pool
already starts catching exceptions during pickling tasks in the parent and only re-raises them when you call .get()
on the AsyncResult
object you immediately get when you call pool.apply_async()
.
That's why (with Python 2) you better always use it like this, even if your target-function doesn't return anything (still returns implicit None
):
results = [pool.apply_async(foo, (i,)) for i in range(4)]
# `pool.apply_async()` immediately returns AsyncResult (ApplyResult) object
for res in results:
res.get()
Non-async Pool-methods like Pool.map()
and Pool.starmap()
use the same (asynchronous) low-level functions under the hood like their asynchronous siblings, but they additionally call .get()
for you, so you will always see an exception with these methods.
Python 3 has an error_callback
-parameter for asynchronous Pool-methods you can use instead to handle exceptions.
Upvotes: 7