Michael Gradek
Michael Gradek

Reputation: 2738

Parallel processing loop using multiprocessing Pool

I want to process a large for loop in parallel, and from what I have read the best way to do this is to use the multiprocessing library that comes standard with Python.

I have a list of around 40,000 objects, and I want to process them in parallel in a separate class. The reason for doing this in a separate class is mainly because of what I read here.

In one class I have all the objects in a list and via the multiprocessing.Pool and Pool.map functions I want to carry out parallel computations for each object by making it go through another class and return a value.

# ... some class that generates the list_objects
pool = multiprocessing.Pool(4)
results = pool.map(Parallel, self.list_objects)

And then I have a class which I want to process each object passed by the pool.map function:

class Parallel(object):
    def __init__(self, args):
        self.some_variable          = args[0]
        self.some_other_variable    = args[1]
        self.yet_another_variable   = args[2]
        self.result                 = None

    def __call__(self):
        self.result                 = self.calculate(self.some_variable)

The reason I have a call method is due to the post I linked before, yet I'm not sure I'm using it correctly as it seems to have no effect. I'm not getting the self.result value to be generated.

Any suggestions? Thanks!

Upvotes: 1

Views: 2625

Answers (2)

unutbu
unutbu

Reputation: 879341

Use a plain function, not a class, when possible. Use a class only when there is a clear advantage to doing so.

If you really need to use a class, then given your setup, pass an instance of Parallel:

results = pool.map(Parallel(args), self.list_objects)

Since the instance has a __call__ method, the instance itself is callable, like a function.


By the way, the __call__ needs to accept an additional argument:

def __call__(self, val):

since pool.map is essentially going to call in parallel

p = Parallel(args)
result = []
for val in self.list_objects:
    result.append(p(val))

Upvotes: 3

loopbackbee
loopbackbee

Reputation: 23322

Pool.map simply applies a function (actually, a callable) in parallel. It has no notion of objects or classes. Since you pass it a class, it simply calls __init__ - __call__ is never executed. You need to either call it explicitly from __init__ or use pool.map(Parallel.__call__, preinitialized_objects)

Upvotes: 2

Related Questions