Reputation: 4803
I want to calculate the Euclidean distance in multiple dimensions (24 dimensions) between 2 arrays. I'm using numpy-Scipy.
Here is my code:
import numpy,scipy;
A=numpy.array([116.629, 7192.6, 4535.66, 279714, 176404, 443608, 295522, 1.18399e+07, 7.74233e+06, 2.85839e+08, 2.30168e+08, 5.6919e+08, 168989, 7.48866e+06, 1.45261e+06, 7.49496e+07, 2.13295e+07, 3.74361e+08, 54.5, 3349.39, 262.614, 16175.8, 3693.79, 205865]);
B=numpy.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151246, 6795630, 4566625, 2.0355328e+08, 1.4250515e+08, 3.2699482e+08, 95635, 4470961, 589043, 29729866, 6124073, 222.3]);
However, I used scipy.spatial.distance.cdist(A[numpy.newaxis,:],B,'euclidean')
to calcuate the eucleidan distance.
But it gave me an error
raise ValueError('XB must be a 2-dimensional array.');
I don't seem to understand it.
I looked up scipy.spatial.distance.pdist
but don't understand how to use it?
Is there any other better way to do it?
Upvotes: 22
Views: 37539
Reputation: 1815
Writing your own custom sqaure root sum square is not always safe
You can use math.hypot, numpy.hypot or scipy distance function rather than writing numpy.sqrt(numpy.sum((A - B)**2))
or (i**2 + j**2)**0.5
. In your case maybe they can overflow
%%timeit
math.hypot(*(A - B))
# 3 µs ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
numpy.sqrt(numpy.sum((A - B)**2))
# 5.65 µs ± 50.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
i, j = 1e-200, 1e-200
np.sqrt(i**2+j**2)
# 0.0
i, j = 1e+200, 1e+200
np.sqrt(i**2+j**2)
# inf
i, j = 1e-200, 1e-200
np.hypot(i, j)
# 1.414213562373095e-200
i, j = 1e+200, 1e+200
np.hypot(i, j)
# 1.414213562373095e+200
Upvotes: 1
Reputation: 61666
Starting Python 3.8
, you can use standard library's math
module and its new dist
function, which returns the euclidean distance between two points (given as lists or tuples of coordinates):
from math import dist
dist([1, 0, 0], [0, 1, 0]) # 1.4142135623730951
Upvotes: 11
Reputation: 28752
Perhaps scipy.spatial.distance.euclidean
?
Examples
>>> from scipy.spatial import distance >>> distance.euclidean([1, 0, 0], [0, 1, 0]) 1.4142135623730951 >>> distance.euclidean([1, 1, 0], [0, 1, 0]) 1.0
Upvotes: 27
Reputation: 296
Since all of the above answers refer to numpy and or scipy, just wanted to point out that something really simple can be done with reduce here
def n_dimensional_euclidean_distance(a, b):
"""
Returns the euclidean distance for n>=2 dimensions
:param a: tuple with integers
:param b: tuple with integers
:return: the euclidean distance as an integer
"""
dimension = len(a) # notice, this will definitely throw a IndexError if len(a) != len(b)
return sqrt(reduce(lambda i,j: i + ((a[j] - b[j]) ** 2), range(dimension), 0))
This will sum all pairs of (a[j] - b[j])^2 for all j in the number of dimensions (note that for simplicity this doesn't support n<2 dimensional distance).
Upvotes: 5
Reputation: 363567
Apart from the already mentioned ways of computing the Euclidean distance, here's one that's close to your original code:
scipy.spatial.distance.cdist([A], [B], 'euclidean')
or
scipy.spatial.distance.cdist(np.atleast_2d(A), np.atleast_2d(B), 'euclidean')
This returns a 1×1 np.ndarray
holding the L2 distance.
Upvotes: 4
Reputation: 2362
A
and B
are 2 points in the 24-D space. You should use scipy.spatial.distance.euclidean
.
scipy.spatial.distance.euclidean(A, B)
Upvotes: 7
Reputation: 32511
Use either
numpy.sqrt(numpy.sum((A - B)**2))
or more simply
numpy.linalg.norm(A - B)
Upvotes: 14