gota
gota

Reputation: 2649

Python multiprocessing - How to create a function that parallelizes a for loop

If you open a Jupyter Notebook and run this:

import multiprocessing
def f(x):
    a = 3 * x
    pool = multiprocessing.Pool(processes=1)
    global g
    def g(j):
        return a * j
    return pool.map(g, range(5))
f(1)

You will get the following errors

Process ForkPoolWorker-1:
Traceback (most recent call last):
  File "/Users/me/anaconda3/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/Users/me/anaconda3/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/me/anaconda3/lib/python3.5/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/Users/me/anaconda3/lib/python3.5/multiprocessing/queues.py", line 345, in get
    return ForkingPickler.loads(res)
AttributeError: Can't get attribute 'g' on <module '__main__'>

and I'm trying to understand if this is a bug or a feature.

I'm trying to get this working because in my real case f is basically a for loop easily parallelizable (you only change one parameter each iteration) but that takes a lot of time on each iteration! Am I approaching the problem correctly or is there an easier way? (Note: Throughout the notebook f will be called several times with different parameters itself)

Upvotes: 3

Views: 1707

Answers (2)

Kritzefitz
Kritzefitz

Reputation: 2754

If you want to apply g to more arguments than only the iterator element passed by pool.map you can use functools.partial like this:

import multiprocessing
import functools

def g(a, j):
    return a * j

def f(x):
    a = 3 * x
    pool = multiprocessing.Pool(processes=1)
    g_with_a = functools.partial(g, a)
    return pool.map(g_with_a, range(5))

f(1)

What functools.partial does, is to take a function and an arbitrary number of arguments (both by position and keyword) and returns a new function that behaves like the function you passed to it, but only takes the arguments you didn't pass to partial.

The function returned by partial can be pickled without problems i. e. passed to pool.map, as long as you're using python3.

This is essentially the same as Darth Kotik described in his answer, but you don't have to implement the Calculator class yourself, as partial already does what you want.

Upvotes: 1

Darth Kotik
Darth Kotik

Reputation: 2351

It works just fine if you define g outside of f.

import multiprocessing

def g(j):
    return 4 * j

def f():
    pool = multiprocessing.Pool(processes=1)
    return pool.map(g, range(5))

f()

Edit: In example you put in your question callable object will look somewhat like this:

class Calculator():
    def __init__(self, j):
        self.j = j

    def __call__(self, x):
        return self.j*x

and your function f becomes something like this:

def f(j):
    calculator = Calculator(j) 
    pool = multiprocessing.Pool(processes=1)
    return pool.map(calculator, range(5))

I in this case it works just fine. Hope it helped.

Upvotes: 2

Related Questions