Vectorize over only one axis in a 2D array with numpy vectorize

Question

I have the following function to get the Euclidean distance between two vectors a and b.

def distance_func(a,b):
    distance = np.linalg.norm(b-a)
    return distance

Here, I want a to be an element of an array of vectors. So I used numpy vectorize to iterate over the array. (In order to get a better speed than iterating with a for loop)

vfunc = np.vectorize(distance_func)

I used this as follows to get an array of Euclidean distances

a = np.array([[1,2],[2,3],[3,4],[4,5],[5,6]])
b = np.array([1,2])

vfunc(a,b)

But this function returns:

array([[ 0., 0.], [ 1., 1.], [ 2., 2.], [ 3., 3.], [ 4., 4.]])

This is the result of performing the operation np.linalg.norm(a-b) individually for the second vector. How do I use numpy vectorize to get the array of Euclidean distance in this way?

eickenberg · Accepted Answer

If you want to compute euclidean distances between all of your data points, you should use one of the functions provided to this effect

from sklearn.metrics import euclidean_distances
from scipy.spatial import distance_matrix

They are optimized to calculate the distances of several points a to several point b in a fully vectorized manner.

import numpy as np
a = np.random.randn(100, 2)
b = np.random.randn(200, 2)

d1 = euclidean_distances(a, b)
d2 = distance_matrix(a, b, p=2)
print d1.shape  # yields (100, 200), one distance for each possible couple
print d2.shape

Speed considerations

In [90]: %timeit d1 = euclidean_distances(a, b)
1000 loops, best of 3: 403 us per loop

In [91]: %timeit d2 = distance_matrix(a, b, p=2)
1000 loops, best of 3: 699 us per loop

Vectorize over only one axis in a 2D array with numpy vectorize

Answers (2)

Related Questions