Reputation: 7974
I have a function which does this: it takes a given numpy array A
and a given function func
and applies the function to each element of the array.
def transform(A, func):
return func(A)
A
and func
are supplied from outside and I do not have control over them. I would like the functions to work if they are vectorized functions such as transform(A, np.sin)
but I also want to be able to accept normal numpy function e.g. lambdas like transform(A, lambda x: x^2 if x > 5 else 0)
. Of course the second is not vectorized so I would need to call np.vectorize()
before applying it. Like this: transform(A, np.vectorize(lambda x: x^2 if x > 5 else 0))
... But I do nto want to impose this burden on the users. I would like a unified approach to all functions. I just get a function from outside and apply it.
Is there a method to decide which function requires vectorization and which does not? Something like:
def transform(A, func):
if requires_vectorization(func): # how to do this???
func = np.vectorize(func)
return func(A)
Or should I just vectorize all by default
def transform(A, func):
func = np.vectorize(func) # is this correct and efficient?
return func(A)
Is this solution good? In other words, does it hurt to call np.vectorize
on already vectorized function? Or is there any alternative?
Upvotes: 4
Views: 1063
Reputation: 74172
Following the EAFP principle, you could first try calling the function directly on A
and see if this raises an exception:
import numpy as np
def transform(A, func):
try:
return func(A)
except TypeError:
return np.vectorize(func)(A)
For example:
import math
A = np.linspace(0, np.pi, 5)
print(transform(A, np.sin)) # vectorized function
# [ 0.00000000e+00 7.07106781e-01 1.00000000e+00 7.07106781e-01
# 1.22464680e-16]
print(transform(A, math.sin)) # non-vectorized function
# [ 0.00000000e+00 7.07106781e-01 1.00000000e+00 7.07106781e-01
# 1.22464680e-16]
does it hurt to call np.vectorize on already vectorized function?
Yes, absolutely. When you apply np.vectorize
to a function, all of the looping over input array elements is done in Python, unlike in "proper" vectorized numpy functions which do their looping in C. From the documentation:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
I feel like this sentence should be written in bold all-caps.
Case in point:
In [1]: vecsin = np.vectorize(np.sin)
In [2]: %%timeit A = np.random.randn(10000);
np.sin(A)
....:
1000 loops, best of 3: 243 µs per loop
In [3]: %%timeit A = np.random.randn(10000);
vecsin(A)
....:
100 loops, best of 3: 11.7 ms per loop
In [4]: %%timeit A = np.random.randn(10000);
[np.sin(a) for a in A]
....:
100 loops, best of 3: 12.5 ms per loop
In this example, applying np.vectorize
to np.sin
slows it down by a factor of ~50, making it about as slow as a regular Python list comprehension.
For completeness, here's the "transformed" version. As you can see, the try
/except
block has a negligible impact on performance:
In [5]: %%timeit A = np.random.randn(10000);
transform(A, np.sin)
...:
1000 loops, best of 3: 241 µs per loop
Upvotes: 3