RoQuOTriX
RoQuOTriX

Reputation: 3001

Does calling a numpy function in a vectorized operation affect performance?

I am new to python and currently studying the numPy package. I come from the C/C++ world, so maybe my question is stupid. When using vectorized operations in numPy, I assume that they parallelize the execution like openMP does.

I came across a piece of code in an udacity tutorial, which calculated a standardized 1D-array in the following way:

standardized = (array - array.mean()) / array.std()

where array is a numPy array. So in my eyes numPy would parallelize the following 'single' instructions to get a better performance:

standardized[0] = (array[0] - array.mean()) / array.std()
standardized[1] = (array[1] - array.mean()) / array.std()
...
...
standardized[n] = (array[n] - array.mean()) / array.std()

where n is the size of the array. So in every iteration, I would call mean() and std() which gets always calculated and therefore needs a lot of time. In a 'C way' I would do something like this, to increase performance:

mean = array.mean()
std = array.std()
standardized = (array - mean) / std

I measured times for both calculations and nearly got always the same time. In fact, it depends on which method I use first, which is the fastest. Additionally, I only used array filled with zeros, maybe this has an impact, too.

So my question is, how does python (or numPy) 'parallalize' the vectorized execution and how does it deal with function calls, which should always return the same value in one iteration.

I hope my questions are clear and understandable. I could not find any sources which deals with this use-case.

Upvotes: 1

Views: 113

Answers (2)

hpaulj
hpaulj

Reputation: 231510

standardized = (array - array.mean()) / array.std()

is a Python expression which gets evaluated as:

temp1 = array.mean()     
temp2 = array.std()
temp3 = (array - temp1)
temp4 = temp3 / temp2

array.mean is a numpy 'builtin' method, which means it's written in compiled code. Same for std. And for subtraction and division of two arrays.

numpy provides building blocks, python provides the glue to join them together. Generally the best strategy is to maximize the use of those numpy methods. And avoid loops at the Python level. Sometimes a few loops on a complex operation is better, and sometimes using basic Python is better (creating an array from lists takes time).

There are tools for building custom compiled blocks - cython, numba etc.

Upvotes: 2

deets
deets

Reputation: 6395

I'm not aware of any OpenMP-style parallelization in numpy. Speed-gains come from using C/Fortran/specialised libraries such as LAPack/BLAS etc. You can roll your own parallelization using multiprocessing if you can afford the marshaling cost.

There seems to be a way to enable OpenMP if you build yourself: https://docs.scipy.org/doc/scipy/reference/building/linux.html

Upvotes: 0

Related Questions