apply both vectorized and non-vectorized function on numpy array

I have a function which does this: it takes a given numpy array A and a given function func and applies the function to each element of the array.

def transform(A, func):
    return func(A)

A and func are supplied from outside and I do not have control over them. I would like the functions to work if they are vectorized functions such as transform(A, np.sin) but I also want to be able to accept normal numpy function e.g. lambdas like transform(A, lambda x: x^2 if x > 5 else 0). Of course the second is not vectorized so I would need to call np.vectorize() before applying it. Like this: transform(A, np.vectorize(lambda x: x^2 if x > 5 else 0))... But I do nto want to impose this burden on the users. I would like a unified approach to all functions. I just get a function from outside and apply it.

Is there a method to decide which function requires vectorization and which does not? Something like:

def transform(A, func):
    if requires_vectorization(func):  # how to do this???
        func = np.vectorize(func)
    return func(A)   

Or should I just vectorize all by default

def transform(A, func):
    func = np.vectorize(func)  # is this correct and efficient?
    return func(A)   

Is this solution good? In other words, does it hurt to call np.vectorize on already vectorized function? Or is there any alternative?

Upvotes: 4

Views: 1063

Answers (1)

ali_m
ali_m

Reputation: 74172

Following the EAFP principle, you could first try calling the function directly on A and see if this raises an exception:

import numpy as np

def transform(A, func):
    try:
        return func(A)
    except TypeError:
        return np.vectorize(func)(A)

For example:

import math

A = np.linspace(0, np.pi, 5)

print(transform(A, np.sin))     # vectorized function
# [  0.00000000e+00   7.07106781e-01   1.00000000e+00   7.07106781e-01
#    1.22464680e-16]

print(transform(A, math.sin))   # non-vectorized function
# [  0.00000000e+00   7.07106781e-01   1.00000000e+00   7.07106781e-01
#    1.22464680e-16]

does it hurt to call np.vectorize on already vectorized function?

Yes, absolutely. When you apply np.vectorize to a function, all of the looping over input array elements is done in Python, unlike in "proper" vectorized numpy functions which do their looping in C. From the documentation:

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

I feel like this sentence should be written in bold all-caps.

Case in point:

In [1]: vecsin = np.vectorize(np.sin)

In [2]: %%timeit A = np.random.randn(10000);
np.sin(A)
   ....: 
1000 loops, best of 3: 243 µs per loop

In [3]: %%timeit A = np.random.randn(10000);
vecsin(A)
   ....: 
100 loops, best of 3: 11.7 ms per loop

In [4]: %%timeit A = np.random.randn(10000);
[np.sin(a) for a in A]
   ....: 
100 loops, best of 3: 12.5 ms per loop

In this example, applying np.vectorize to np.sin slows it down by a factor of ~50, making it about as slow as a regular Python list comprehension.

Edit:

For completeness, here's the "transformed" version. As you can see, the try/except block has a negligible impact on performance:

In [5]: %%timeit A = np.random.randn(10000);
transform(A, np.sin)
   ...: 
1000 loops, best of 3: 241 µs per loop

Upvotes: 3

Related Questions