garak
garak

Reputation: 4803

Multidimensional Euclidean Distance in Python

I want to calculate the Euclidean distance in multiple dimensions (24 dimensions) between 2 arrays. I'm using numpy-Scipy.

Here is my code:

import numpy,scipy;

A=numpy.array([116.629, 7192.6, 4535.66, 279714, 176404, 443608, 295522, 1.18399e+07, 7.74233e+06, 2.85839e+08, 2.30168e+08, 5.6919e+08, 168989, 7.48866e+06, 1.45261e+06, 7.49496e+07, 2.13295e+07, 3.74361e+08, 54.5, 3349.39, 262.614, 16175.8, 3693.79, 205865]);

B=numpy.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151246, 6795630, 4566625, 2.0355328e+08, 1.4250515e+08, 3.2699482e+08, 95635, 4470961, 589043, 29729866, 6124073, 222.3]);

However, I used scipy.spatial.distance.cdist(A[numpy.newaxis,:],B,'euclidean') to calcuate the eucleidan distance.

But it gave me an error

raise ValueError('XB must be a 2-dimensional array.');

I don't seem to understand it.

I looked up scipy.spatial.distance.pdist but don't understand how to use it?

Is there any other better way to do it?

Upvotes: 22

Views: 37539

Answers (7)

eroot163pi
eroot163pi

Reputation: 1815

Writing your own custom sqaure root sum square is not always safe

You can use math.hypot, numpy.hypot or scipy distance function rather than writing numpy.sqrt(numpy.sum((A - B)**2)) or (i**2 + j**2)**0.5. In your case maybe they can overflow

refer

Speed wise

%%timeit
math.hypot(*(A - B))
# 3 µs ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
numpy.sqrt(numpy.sum((A - B)**2))
# 5.65 µs ± 50.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Safety wise

Underflow

i, j = 1e-200, 1e-200
np.sqrt(i**2+j**2)
# 0.0

Overflow

i, j = 1e+200, 1e+200
np.sqrt(i**2+j**2)
# inf

No Underflow

i, j = 1e-200, 1e-200
np.hypot(i, j)
# 1.414213562373095e-200

No Overflow

i, j = 1e+200, 1e+200
np.hypot(i, j)
# 1.414213562373095e+200

Upvotes: 1

Xavier Guihot
Xavier Guihot

Reputation: 61666

Starting Python 3.8, you can use standard library's math module and its new dist function, which returns the euclidean distance between two points (given as lists or tuples of coordinates):

from math import dist

dist([1, 0, 0], [0, 1, 0]) # 1.4142135623730951

Upvotes: 11

Michael Mior
Michael Mior

Reputation: 28752

Perhaps scipy.spatial.distance.euclidean?

Examples

>>> from scipy.spatial import distance
>>> distance.euclidean([1, 0, 0], [0, 1, 0])
1.4142135623730951
>>> distance.euclidean([1, 1, 0], [0, 1, 0])
1.0

Upvotes: 27

Fohlen
Fohlen

Reputation: 296

Since all of the above answers refer to numpy and or scipy, just wanted to point out that something really simple can be done with reduce here

def n_dimensional_euclidean_distance(a, b):
   """
   Returns the euclidean distance for n>=2 dimensions
   :param a: tuple with integers
   :param b: tuple with integers
   :return: the euclidean distance as an integer
   """
   dimension = len(a) # notice, this will definitely throw a IndexError if len(a) != len(b)

   return sqrt(reduce(lambda i,j: i + ((a[j] - b[j]) ** 2), range(dimension), 0))

This will sum all pairs of (a[j] - b[j])^2 for all j in the number of dimensions (note that for simplicity this doesn't support n<2 dimensional distance).

Upvotes: 5

Fred Foo
Fred Foo

Reputation: 363567

Apart from the already mentioned ways of computing the Euclidean distance, here's one that's close to your original code:

scipy.spatial.distance.cdist([A], [B], 'euclidean')

or

scipy.spatial.distance.cdist(np.atleast_2d(A), np.atleast_2d(B), 'euclidean')

This returns a 1×1 np.ndarray holding the L2 distance.

Upvotes: 4

Ade YU
Ade YU

Reputation: 2362

A and B are 2 points in the 24-D space. You should use scipy.spatial.distance.euclidean.

Doc here

scipy.spatial.distance.euclidean(A, B)

Upvotes: 7

YXD
YXD

Reputation: 32511

Use either

numpy.sqrt(numpy.sum((A - B)**2))

or more simply

numpy.linalg.norm(A - B)

Upvotes: 14

Related Questions