Shirley Hou
Shirley Hou

Reputation: 21

Pickle in multiprocessing

The goal of following code is merely to fill array a with 0 to 9:

import multiprocessing
import numpy as np
from joblib import Parallel, delayed

num_cores = multiprocessing.cpu_count()

inputs = range(10)
a = np.zeros(10)

def processInput():
    def testNested(t):
        a[t]= t
    Parallel(n_jobs=num_cores, backend="threading")(delayed(testNested)(t) for t in range(0, 10))

processInput()

I get the pickle error when I'm trying to call multiprocess in a function:

AttributeError: Can't pickle local object 'processInput.<locals>.testNested'

Question: Any suggestion how to achieve this goal, in a case I have to operate multiprocess within other functions?

Upvotes: 1

Views: 2165

Answers (2)

georgexsh
georgexsh

Reputation: 16624

Follow the error message, as the document stated, a nested function cannot be pickled, you should define worker function at the top level of a module.

Upvotes: 2

Lycopersicum
Lycopersicum

Reputation: 569

There are examples shown in built-in multiprocessing package, I slightly modified first one, which includes using Pool class:

class multiprocessing.pool.Pool([processes[, initializer[, initargs[, maxtasksperchild[, context]]]]])

A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation.

processes is the number of worker processes to use. If processes is None then the number returned by os.cpu_count() is used.

If initializer is not None then each worker process will call initializer(*initargs) when it starts.

maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.

context can be used to specify the context used for starting the worker processes. Usually a pool is created using the function multiprocessing.Pool() or the Pool() method of a context object. In both cases context is set appropriately.

Note that the methods of the pool object should only be called by the process which created the pool.

Therefore you don't really need to import and use joblib, because Pool uses the number of CPUs in the system by default:

from multiprocessing import Pool

def f(x):
    return x

if __name__ == '__main__':
    with Pool() as p:
        print(p.map(f, list(range(10))))

however same result could be achieved by writing "one-liner":

print(list(range(10)))

I mapped pool p to function f() in case you want to do something more complex than just assigning values to index.

Upvotes: 0

Related Questions