Reputation: 2238
I have the following function to get the Euclidean distance between two vectors a
and b
.
def distance_func(a,b):
distance = np.linalg.norm(b-a)
return distance
Here, I want a
to be an element of an array of vectors. So I used numpy vectorize to iterate over the array. (In order to get a better speed than iterating with a for loop)
vfunc = np.vectorize(distance_func)
I used this as follows to get an array of Euclidean distances
a = np.array([[1,2],[2,3],[3,4],[4,5],[5,6]])
b = np.array([1,2])
vfunc(a,b)
But this function returns:
array([[ 0., 0.], [ 1., 1.], [ 2., 2.], [ 3., 3.], [ 4., 4.]])
This is the result of performing the operation np.linalg.norm(a-b)
individually for the second vector.
How do I use numpy vectorize to get the array of Euclidean distance in this way?
Upvotes: 1
Views: 3034
Reputation: 14377
If you want to compute euclidean distances between all of your data points, you should use one of the functions provided to this effect
from sklearn.metrics import euclidean_distances
from scipy.spatial import distance_matrix
They are optimized to calculate the distances of several points a
to several point b
in a fully vectorized manner.
import numpy as np
a = np.random.randn(100, 2)
b = np.random.randn(200, 2)
d1 = euclidean_distances(a, b)
d2 = distance_matrix(a, b, p=2)
print d1.shape # yields (100, 200), one distance for each possible couple
print d2.shape
Speed considerations
In [90]: %timeit d1 = euclidean_distances(a, b)
1000 loops, best of 3: 403 us per loop
In [91]: %timeit d2 = distance_matrix(a, b, p=2)
1000 loops, best of 3: 699 us per loop
Upvotes: 1
Reputation: 69202
You don't need to use vectorize
, you can just do:
a = np.array([[1,2],[2,3],[3,4],[4,5],[5,6]])
b = np.array([1,2])
np.linalg.norm(a-b, axis=1)
which gives:
[ 0. 1.41421356 2.82842712 4.24264069 5.65685425]
(I assume this is what you want, but if not, please also show the result you expect for your example.)
Upvotes: 4