user24205
user24205

Reputation: 502

Parallel processing with Pool in Python

I've attempted to run parallel processing on a locally defined function as follows:

import multiprocessing as mp                                                                                               
import numpy as np
import pdb


def testFunction():                                                                                                        
  x = np.asarray( range(1,10) )
  y = np.asarray( range(1,10) )

  def myFunc( i ):
    return np.sum(x[0:i]) * y[i]

  p = mp.Pool( mp.cpu_count() )
  out = p.map( myFunc, range(0,x.size) )
  print( out )


if __name__ == '__main__':
  print( 'I got here' )                                                                                                         
  testFunction()

When I do so, I get the following error:

cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

How can I use multiprocessing to processing several arrays in parallel like I'm trying to do here? x and y are necessarily defined inside the function; I'd rather not make them global variables.

All help is appreciated.

Upvotes: 2

Views: 2027

Answers (1)

constt
constt

Reputation: 2320

Just make the processing function global and pass pairs of array values instead of referencing them by index in the function:

import multiprocessing as mp

import numpy as np


def process(inputs):
    x, y = inputs

    return x * y


def main():
    x = np.asarray(range(10))
    y = np.asarray(range(10))

    with mp.Pool(mp.cpu_count()) as pool:
        out = pool.map(process, zip(x, y))

    print(out)


if __name__ == '__main__':
    main()

Output:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

UPDATE: According to the new details provided, you have to share arrays between different processes. This is exactly what the multiprocessing.Manager is used for.

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.

So the resulting code will look something like this:

from functools import partial
import multiprocessing as mp

import numpy as np


def process(i, x, y):
    return np.sum(x[:i]) * y[i]


def main():
    manager = mp.Manager()

    x = manager.Array('i', range(10))
    y = manager.Array('i', range(10))

    func = partial(process, x=x, y=y)

    with mp.Pool(mp.cpu_count()) as pool:
        out = pool.map(func, range(len(x)))

    print(out)


if __name__ == '__main__':
    main()

Output:

[0, 0, 2, 9, 24, 50, 90, 147, 224, 324]

Upvotes: 3

Related Questions