Does calling a numpy function in a vectorized operation affect performance?

Question

I am new to python and currently studying the numPy package. I come from the C/C++ world, so maybe my question is stupid. When using vectorized operations in numPy, I assume that they parallelize the execution like openMP does.

I came across a piece of code in an udacity tutorial, which calculated a standardized 1D-array in the following way:

standardized = (array - array.mean()) / array.std()

where array is a numPy array. So in my eyes numPy would parallelize the following 'single' instructions to get a better performance:

standardized[0] = (array[0] - array.mean()) / array.std()
standardized[1] = (array[1] - array.mean()) / array.std()
...
...
standardized[n] = (array[n] - array.mean()) / array.std()

where n is the size of the array. So in every iteration, I would call mean() and std() which gets always calculated and therefore needs a lot of time. In a 'C way' I would do something like this, to increase performance:

mean = array.mean()
std = array.std()
standardized = (array - mean) / std

I measured times for both calculations and nearly got always the same time. In fact, it depends on which method I use first, which is the fastest. Additionally, I only used array filled with zeros, maybe this has an impact, too.

So my question is, how does python (or numPy) 'parallalize' the vectorized execution and how does it deal with function calls, which should always return the same value in one iteration.

I hope my questions are clear and understandable. I could not find any sources which deals with this use-case.

hpaulj · Accepted Answer

standardized = (array - array.mean()) / array.std()

is a Python expression which gets evaluated as:

temp1 = array.mean()     
temp2 = array.std()
temp3 = (array - temp1)
temp4 = temp3 / temp2

array.mean is a numpy 'builtin' method, which means it's written in compiled code. Same for std. And for subtraction and division of two arrays.

numpy provides building blocks, python provides the glue to join them together. Generally the best strategy is to maximize the use of those numpy methods. And avoid loops at the Python level. Sometimes a few loops on a complex operation is better, and sometimes using basic Python is better (creating an array from lists takes time).

There are tools for building custom compiled blocks - cython, numba etc.

Does calling a numpy function in a vectorized operation affect performance?

Answers (2)

Related Questions