Reputation: 119
I have a for loop doing some operation on the elements of an array. There are 1e5 elements in the array
import numpy as np
A=np.array([1,2,3,4..........100000)]
for i in range(0,len(A)):
A[i]=(A[i]*2+A[i]*4)**(1/3)
I want to obtain parallelisation in the above code so that each execution of the for loop goes to a different core to make the code execution faster. I have a workstation with 48 cores. How to achieve this parallel processing in python? Please help.
Upvotes: 0
Views: 81
Reputation: 155684
Don't bother parallelizing just yet. Right now, you're taking no advantage of numpy
vectorization; you may as well be using Python list
(or maybe array.array
) for all the benefit numpy
is giving you.
Actually use the vectorization features, and the overhead should drop by several orders of magnitude:
import numpy as np
A = np.array([1,2,3,4..........100000]) # If this is actually the values you want, use np.arange(1, 100000+1) to speed it up
A = (A * 6) ** (1 / 3)
# If the result should truncate back to int64, not convert to doubles, cast back at the end
A = A.astype(np.int64)
(A * 6) ** (1 / 3)
does the same work as the for
loop did, but much faster (you could match the original code more closely with A = (A * 2 + A * 4) ** (1/3)
, but multiplying by 2
and 4
separately and adding them together is pointless when you could just multiply by 6
directly). The final (optional, depending on intent) line gets exact equivalent behavior of the original loop by truncating back to the original integer dtype
.
Comparing performance with ipython
%%timeit
magic for a microbenchmark:
In [2]: %%timeit
...: A = np.arange(1, 100000+1)
...: for i in range(len(A)):
...: A[i] = (A[i]*2 + A[i]*4) ** (1/3)
...:
427 ms ± 6.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [3]: %%timeit
...: A = np.arange(1, 100000+1)
...: A = (A * 6) ** (1/3)
...:
2.72 ms ± 51 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The vectorized code takes about 0.6% of the time taken by the naive loop; merely parallelizing the naive loop would never come close to achieving that sort of speedup. Adding the .astype(np.int64)
cast only increases runtime by about 6%, still a trivial fraction of what the original for
loop required.
Upvotes: 4