Reputation: 3001
I am new to python
and currently studying the numPy
package. I come from the C/C++
world, so maybe my question is stupid. When using vectorized operations in numPy
, I assume that they parallelize the execution like openMP
does.
I came across a piece of code in an udacity tutorial, which calculated a standardized 1D-array in the following way:
standardized = (array - array.mean()) / array.std()
where array is a numPy
array. So in my eyes numPy
would parallelize the following 'single' instructions to get a better performance:
standardized[0] = (array[0] - array.mean()) / array.std()
standardized[1] = (array[1] - array.mean()) / array.std()
...
...
standardized[n] = (array[n] - array.mean()) / array.std()
where n
is the size of the array
. So in every iteration, I would call mean()
and std()
which gets always calculated and therefore needs a lot of time. In a 'C way'
I would do something like this, to increase performance:
mean = array.mean()
std = array.std()
standardized = (array - mean) / std
I measured times for both calculations and nearly got always the same time. In fact, it depends on which method I use first, which is the fastest. Additionally, I only used array filled with zeros, maybe this has an impact, too.
So my question is, how does python
(or numPy
) 'parallalize' the vectorized execution and how does it deal with function calls, which should always return the same value in one iteration.
I hope my questions are clear and understandable. I could not find any sources which deals with this use-case.
Upvotes: 1
Views: 113
Reputation: 231510
standardized = (array - array.mean()) / array.std()
is a Python expression which gets evaluated as:
temp1 = array.mean()
temp2 = array.std()
temp3 = (array - temp1)
temp4 = temp3 / temp2
array.mean
is a numpy 'builtin' method, which means it's written in compiled code. Same for std
. And for subtraction and division of two arrays.
numpy
provides building blocks, python provides the glue to join them together. Generally the best strategy is to maximize the use of those numpy methods. And avoid loops at the Python level. Sometimes a few loops on a complex operation is better, and sometimes using basic Python is better (creating an array from lists takes time).
There are tools for building custom compiled blocks - cython
, numba
etc.
Upvotes: 2
Reputation: 6395
I'm not aware of any OpenMP-style parallelization in numpy. Speed-gains come from using C/Fortran/specialised libraries such as LAPack/BLAS etc. You can roll your own parallelization using multiprocessing if you can afford the marshaling cost.
There seems to be a way to enable OpenMP if you build yourself: https://docs.scipy.org/doc/scipy/reference/building/linux.html
Upvotes: 0