Reputation: 2649
If you open a Jupyter Notebook and run this:
import multiprocessing
def f(x):
a = 3 * x
pool = multiprocessing.Pool(processes=1)
global g
def g(j):
return a * j
return pool.map(g, range(5))
f(1)
You will get the following errors
Process ForkPoolWorker-1:
Traceback (most recent call last):
File "/Users/me/anaconda3/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/Users/me/anaconda3/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/Users/me/anaconda3/lib/python3.5/multiprocessing/pool.py", line 108, in worker
task = get()
File "/Users/me/anaconda3/lib/python3.5/multiprocessing/queues.py", line 345, in get
return ForkingPickler.loads(res)
AttributeError: Can't get attribute 'g' on <module '__main__'>
and I'm trying to understand if this is a bug or a feature.
I'm trying to get this working because in my real case f
is basically a for loop easily parallelizable (you only change one parameter each iteration) but that takes a lot of time on each iteration! Am I approaching the problem correctly or is there an easier way? (Note: Throughout the notebook f will be called several times with different parameters itself)
Upvotes: 3
Views: 1707
Reputation: 2754
If you want to apply g
to more arguments than only the iterator element passed by pool.map
you can use functools.partial
like this:
import multiprocessing
import functools
def g(a, j):
return a * j
def f(x):
a = 3 * x
pool = multiprocessing.Pool(processes=1)
g_with_a = functools.partial(g, a)
return pool.map(g_with_a, range(5))
f(1)
What functools.partial
does, is to take a function and an arbitrary number of arguments (both by position and keyword) and returns a new function that behaves like the function you passed to it, but only takes the arguments you didn't pass to partial
.
The function returned by partial
can be pickled without problems i. e. passed to pool.map
, as long as you're using python3.
This is essentially the same as Darth Kotik described in his answer, but you don't have to implement the Calculator
class yourself, as partial
already does what you want.
Upvotes: 1
Reputation: 2351
It works just fine if you define g
outside of f
.
import multiprocessing
def g(j):
return 4 * j
def f():
pool = multiprocessing.Pool(processes=1)
return pool.map(g, range(5))
f()
Edit: In example you put in your question callable object will look somewhat like this:
class Calculator():
def __init__(self, j):
self.j = j
def __call__(self, x):
return self.j*x
and your function f
becomes something like this:
def f(j):
calculator = Calculator(j)
pool = multiprocessing.Pool(processes=1)
return pool.map(calculator, range(5))
I in this case it works just fine. Hope it helped.
Upvotes: 2