user1991
user1991

Reputation: 594

Parallel programming array handling in Python

I have seen a number of questions on using python's multiprocessing but I have not been able to quite wrap my head around on how to use this on my code.

Suppose I have a NxM array. I have a function f that compares the value at (i,j) to every other pixel. So, in essence, I compute NxM values at every point on the grid.

My machine has four cores. I envision that I would split the input locations in four quadrants and then feed each quadrant to a different process.

So, schemetically, my current code would be:

def f(x, array):
    """
    For input location x, do some operation 
    with every other array value. As a simple example,
    this could be the product of array[x] with all other
    array values
    """
    return array[x]*array

if __name__ == '__main__':
    array = some_array
    for x in range(array.size):
        returnval = f(x,array)

`

What would be the best strategy in optimizing a problem like this using multiprocessing?

Upvotes: 0

Views: 191

Answers (2)

memoselyk
memoselyk

Reputation: 4118

The multiprocessing library can do most of the heavy lifting for you, with its Pool.map function.

The only limitation is that will use a function that takes a single argument, the value you are iterating. Based on this other question, the final implementation would be.

from multiprocessing import Pool
from functools import partial

# Make a version of f that only takes x parameter
partial_f = partial(f, array=SOME_ARRAY)

if __name__ == '__main__':
    pool = Pool(processes=5)  # Number of CPUs + 1
    returnval = pool.map(partial_f, range(array.size))

Upvotes: 1

Will Angley
Will Angley

Reputation: 1394

The strategy you've described - send the data set to several workers, have each compute values for part of the data set, and stitch them together at the end - is the classical way to parallelize computation for an algorithm like this where answers don't depend on each other. This is what MapReduce does.

That said, introducing parallelism makes programs harder to understand, and harder to adapt or improve later. Both the algorithm you've described, and the use of raw Python for large numerical computations, might be better things to change first.

Consider checking whether expressing your calculation with numpy arrays makes it run faster, and whether you can find or develop a more efficient algorithm for the problem you're trying to solve, before parallelizing your code.

Upvotes: 0

Related Questions