fahadh4ilyas
fahadh4ilyas

Reputation: 480

How to Do Data Parallelization in Python?

So, I have 3 dimentional list. For example:

A=[[[1,2,3],[4,5,6],[7,8,9]],...,[[2,4,1],[1,4,6],[1,2,4]]]

I want to process each 2 dimentional list in A independently but they all have same process. If I do it in sequence, I do:

for i in range(len(A)):
    A[i]=process(A[i])

But, it takes very long time. Could you tell me how to parallel compute by data parallelization in Python?

Upvotes: 1

Views: 670

Answers (1)

niemmi
niemmi

Reputation: 17263

If you have multiple cores and processing each 2 dimensional list is expensive operation you could use Pool from multiprocessing. Here's a short example that squares numbers in different process:

import multiprocessing as mp

A = [[[1,2,3],[4,5,6],[7,8,9]],[[2,4,1],[1,4,6],[1,2,4]]]

def square(l):
    return [[x * x for x in sub] for sub in l]

pool = mp.Pool(processes=mp.cpu_count())
res = pool.map(square, A)

print res

Output:

[[[1, 4, 9], [16, 25, 36], [49, 64, 81]], [[4, 16, 1], [1, 16, 36], [1, 4, 16]]]

Pool.map will behave like built-in map while splitting the iterable to worker processes. It also has third parameter called chunksize that defines how big chunks are submitted to workers.

Upvotes: 2

Related Questions