Euclidean distance between the two points using vectorized approach

Question

I have two large numpy arrays for which I want to calculate an Euclidean Distance using sklearn. The following MRE achieves what I want in the final result, but since my RL usage is large, I really want a vectorized solution as opposed to using a for loop.

import numpy as np
from sklearn.metrics.pairwise import euclidean_distances

n = 3
sample_size = 5

X = np.random.randint(0, 10, size=(sample_size, n))
Y = np.random.randint(0, 10, size=(sample_size, n))

lst = []

for f in range(0, sample_size):
    ed = euclidean_distances([X[f]], [Y[f]])
    lst.append(ed[0][0])

print(lst)

anon01 · Accepted Answer

euclidean_distances computes the distance for each combination of X,Y points; this will grow large in memory and is totally unnecessary if you just want the distance between each respective row. Sklearn includes a different function called paired_distances that does what you want:

from sklearn.metrics.pairwise import paired_distances
d = paired_distances(X,Y)
# array([5.83095189, 9.94987437, 7.34846923, 5.47722558, 4.        ])

If you need the full pairwise distances, you can get the same result from the diagonal (as pointed out in the comments):

d = euclidean_distances(X,Y).diagonal()

Lastly: arrays are a numpy type, so it is useful to know the numpy api itself (prob. what sklearn calls under the hood). Here are two examples:

d = np.linalg.norm(X-Y, axis=1)
d = np.sqrt(np.sum((X-Y)**2, axis=1))

Euclidean distance between the two points using vectorized approach

Answers (1)

Related Questions