Reputation: 102
I'm trying to understand Numpy by applying vectorisation. I'm trying to find the fastest function to do it.
def get_distances3(coordinates):
return np.linalg.norm(
coordinates[:, None, :] - coordinates[None, :, :],
axis=-1)
coordinates = np.random.rand(1000, 3)
%timeit get_distances3(coordinates)
The function above took 10 loops, best of 3: 35.4 ms per loop. From numpy library there's also an np.vectorize option to do it.
def get_distances4(coordinates):
return np.vectorize(coordinates[:, None, :] - coordinates[None, :, :],axis=-1)
%timeit get_distances4(coordinates)
I tried with np.vectorize below, yet ended up with the following error.
TypeError: __init__() got an unexpected keyword argument 'axis'
How can I find vectorization in get_distances4? How should I edit the lsat code in order to avoid the error? I have never used np.vectorize, so I might be missing something.
Upvotes: 2
Views: 490
Reputation: 23376
You're not calling np.vectorize()
correctly. I suggest referring to the documentation.
Vectorize takes as its argument a function that is written to operate on scalar values, and converts it into a function that can be vectorized over values in arrays according to the Numpy broadcasting rules. It's basically like a fancy map()
for Numpy array.
i.e. as you know Numpy already has built-in vectorized versions of many common functions, but if you had some custom function like "my_special_function(x)" and you wanted to be able to call it on Numpy arrays, you could use my_special_function_ufunc = np.vectorize(my_special_function)
.
In your above example you might "vectorize" your distance function like:
>>> norm = np.linalg.norm
>>> get_distance4 = np.vectorize(lambda a, b: norm(a - b))
>>> get_distance4(coordinates[:, None, :], coordinates[None, :, :])
However, you will find that this is incredibly slow:
>>> %timeit get_distance4(coordinates[:, None, :], coordinates[None, :, :])
1 loop, best of 3: 10.8 s per loop
This is because your first example get_distance3
is already using Numpy's built-in fast implementations of these operations, whereas the np.vectorize
version requires calling the Python function I defined some 3000 times.
In fact according to the docs:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
If you want a potentially faster function for converting distances between vectors you could use scipy.spacial.distance.pdist
:
>>> %timeit get_distances3(coordinates)
10 loops, best of 3: 24.2 ms per loop
>>> %timeit distance.pdist(coordinates)
1000 loops, best of 3: 1.77 ms per loop
It's worth noting that this has a different return formation. Rather than a 1000x1000 array it uses a condensed format that excludes i = j
entries and i > j
entries. If you wish you can then use scipy.spatial.distance.squareform
to convert back to the square matrix format.
Upvotes: 2