Reputation: 158
Given the code example below, one produces an expected result and the other gives an error. Seems confusing for a beginner (i.e. me). I assume the arithmetic operations work element wise but others don't. What's a "good" (i.e. efficient) generalize way to simply perform operations on elements of a multi-dimensional array without having some underlying knowledge of the array behavior?
import numpy as np
data = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(data)
my_function = lambda x: x*2+5
result = my_function(data)
print(result)
Output: [[1 2 3 4] [5 6 7 8]] [[ 7 9 11 13] [15 17 19 21]]
import numpy as np
data = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(data)
my_function = lambda x: x if x < 3 else 0
result = my_function(data)
print(result)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Edit: I am not looking for a particular solution. Yes, I can use np.where or some other mechanisms for this exact example. I am asking about lambdas in particular and how their use seems ambiguous to the user. If it helps, the lamba / filter is coming from command line/outside of module. So it can be anything the user wants to transform the original array to - easy as square elements, or call an API and then use its output to determine the replacement value. You get the idea.
Running python 3.9.13
Upvotes: 0
Views: 234
Reputation: 231385
This works because operators like *
and +
work element-wise for numpy arrays:
In [101]: data = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
...: print(data)
...:
...: my_function = lambda x: x*2+5
...:
...: result = my_function(data)
[[1 2 3 4]
[5 6 7 8]]
my_function = lambda x: x if x < 3 else 0
fails because if x<3
is inherently a scalar operation. if/else
does not iterate; it expects a simple True/False value
In [103]: data<3
Out[103]:
array([[ True, True, False, False],
[False, False, False, False]])
np.vectorize
is the most general tool for applying an array (or arrays) element-wise to a scalar function:
In [104]: f = np.vectorize(my_function, otypes=[int])
In [105]: f(data)
Out[105]:
array([[1, 2, 0, 0],
[0, 0, 0, 0]])
I included the otypes
parameter to avoid one of the more common vectorize
faults that SO ask about.
np.vectorize
is slower than plain iteration for small cases, but becomes competative with large ones. But it's main advantage is that it's simpler to use for multidimensional arrays. It's even better when the function takes several inputs, and you want to take advantage of broadcasting.
Upvotes: 3