Reputation: 594
I have seen a number of questions on using python's multiprocessing
but I have not been able to quite wrap my head around on how to use this on my code.
Suppose I have a NxM array. I have a function f
that compares the value at (i,j) to every other pixel. So, in essence, I compute NxM values at every point on the grid.
My machine has four cores. I envision that I would split the input locations in four quadrants and then feed each quadrant to a different process.
So, schemetically, my current code would be:
def f(x, array):
"""
For input location x, do some operation
with every other array value. As a simple example,
this could be the product of array[x] with all other
array values
"""
return array[x]*array
if __name__ == '__main__':
array = some_array
for x in range(array.size):
returnval = f(x,array)
`
What would be the best strategy in optimizing a problem like this using multiprocessing?
Upvotes: 0
Views: 191
Reputation: 4118
The multiprocessing library can do most of the heavy lifting for you, with its Pool.map function.
The only limitation is that will use a function that takes a single argument, the value you are iterating. Based on this other question, the final implementation would be.
from multiprocessing import Pool
from functools import partial
# Make a version of f that only takes x parameter
partial_f = partial(f, array=SOME_ARRAY)
if __name__ == '__main__':
pool = Pool(processes=5) # Number of CPUs + 1
returnval = pool.map(partial_f, range(array.size))
Upvotes: 1
Reputation: 1394
The strategy you've described - send the data set to several workers, have each compute values for part of the data set, and stitch them together at the end - is the classical way to parallelize computation for an algorithm like this where answers don't depend on each other. This is what MapReduce does.
That said, introducing parallelism makes programs harder to understand, and harder to adapt or improve later. Both the algorithm you've described, and the use of raw Python for large numerical computations, might be better things to change first.
Consider checking whether expressing your calculation with numpy
arrays makes it run faster, and whether you can find or develop a more efficient algorithm for the problem you're trying to solve, before parallelizing your code.
Upvotes: 0