Reputation:
What is efficient (speed) way to apply Piecewise functions on Numpy Array?
Say, for example, Piecewise functions are like
For (1) : x<=2 f(x) = 2*x + x^2
(2) : x>2 f(x) = -(x^2 + 2)
Here's what I did.
data = np.random.random_integers(5, size=(5,6))
print data
np.piecewise(data, [data <= 2, data > 2],
[lambda x: 2*x + pow(2, x),
lambda x: -(pow(x, 2) + 2)])
data =
[[4 2 1 1 5 3]
[4 3 3 5 4 5]
[3 2 4 2 5 3]
[2 5 4 3 1 4]
[5 3 3 5 5 5]]
output =
array([[-18, 8, 4, 4, -27, -11],
[-18, -11, -11, -27, -18, -27],
[-11, 8, -18, 8, -27, -11],
[ 8, -27, -18, -11, 4, -18],
[-27, -11, -11, -27, -27, -27]])
Is there an efficient method to go by for smaller arrays, large arrays, many functions etc? My concern is with lambda functions being used. Not sure if these are Numpy optimized.
Upvotes: 5
Views: 4965
Reputation: 2196
In this case, you should not be concerned about the lambdas: Numpy optimisation is about reducing call overhead, by letting functions evaluate many values at the same time in batch. In each call to np.piecewise
, each function in funclist
(the function parts) is called exactly once, with a numpy array consisting of all values where the appropriate condition is true. Thus, these lambda's are called in a numpy-optimised way.
Similar is np.select
(and np.where
for exactly two parts). The call overhead is the same as it is vectorised the same way, but it will evaluate all functions for all data-points. Thus, it will be slower than np.piecewise
, particularly, when the functions are expensive. In some cases, is is more convenient (no lambda), and one can more easily extend the concept to many variables.
Upvotes: 4