Reputation: 21
The goal of following code is merely to fill array a with 0 to 9:
import multiprocessing
import numpy as np
from joblib import Parallel, delayed
num_cores = multiprocessing.cpu_count()
inputs = range(10)
a = np.zeros(10)
def processInput():
def testNested(t):
a[t]= t
Parallel(n_jobs=num_cores, backend="threading")(delayed(testNested)(t) for t in range(0, 10))
processInput()
I get the pickle error when I'm trying to call multiprocess in a function:
AttributeError: Can't pickle local object 'processInput.<locals>.testNested'
Question: Any suggestion how to achieve this goal, in a case I have to operate multiprocess within other functions?
Upvotes: 1
Views: 2165
Reputation: 16624
Follow the error message, as the document stated, a nested function cannot be pickled, you should define worker function at the top level of a module.
Upvotes: 2
Reputation: 569
There are examples shown in built-in multiprocessing package, I slightly modified first one, which includes using Pool class:
class multiprocessing.pool.Pool([processes[, initializer[, initargs[, maxtasksperchild[, context]]]]])
A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation.
processes
is the number of worker processes to use. If processes isNone
then the number returned byos.cpu_count()
is used.If
initializer
is notNone
then each worker process will callinitializer(*initargs)
when it starts.
maxtasksperchild
is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The defaultmaxtasksperchild
isNone
, which means worker processes will live as long as the pool.
context
can be used to specify the context used for starting the worker processes. Usually a pool is created using the functionmultiprocessing.Pool()
or thePool()
method of a context object. In both cases context is set appropriately.Note that the methods of the pool object should only be called by the process which created the pool.
Therefore you don't really need to import and use joblib
, because Pool uses the number of CPUs in the system
by default:
from multiprocessing import Pool
def f(x):
return x
if __name__ == '__main__':
with Pool() as p:
print(p.map(f, list(range(10))))
however same result could be achieved by writing "one-liner":
print(list(range(10)))
I mapped pool p
to function f()
in case you want to do something more complex than just assigning values to index.
Upvotes: 0