Reputation: 22244
Suppose I have a ndarray of coordinates of shape (n, 2) where each coordinate is (x, y).
X = np.random.random(shape) * 10 # just to generate (x,y).
---
[[9.47743968 8.60682597]
[7.35620992 6.87031756]
[5.05200433 3.62373581]
[4.33732145 3.72994235]
[4.34982473 4.46453609]]
...
A distance between two vectors Xi
and Xj
in X is |Xj - Xi|
. Get the distances of all combinations in X could be done as in the image.
Is it possible to do this with numpy only, not scipy (e.g. scipy.spatial.distance.pdist(X, metric='euclidean', *args, **kwargs)? Please help understand available numpy functions and how to achieve it.
I though I could look into scipy code as scipy.spatial.distance.pdist(X, metric='euclidean', *args, **kwargs) seems to get the distances from vectors.
Pairwise distances between observations in n-dimensional space.
import scipy.spatial
scipy.spatial.distance.pdist(X)
---
array([2.74136411, 6.6645079 , 7.08553522, 6.59173729, 3.9811627 ,
4.35610423, 3.85047223, 0.72253128, 1.09544571, 0.73470014])
scipy distance.py line 2050-2059 appears to be the code that calls a distance function for a corresponding method (e.g. euclidean). However it goes into a C code distance_wrap.c, hence not numpy.
static PyObject *pdist_seuclidean_double_wrap(PyObject *self, PyObject *args,
PyObject *kwargs)
{
PyArrayObject *X_, *dm_, *var_;
int m, n;
double *dm;
const double *X, *var;
static char *kwlist[] = {"X", "dm", "V", NULL};
if (!PyArg_ParseTupleAndKeywords(args, kwargs,
"O!O!O!:pdist_seuclidean_double_wrap", kwlist,
&PyArray_Type, &X_,
&PyArray_Type, &dm_,
&PyArray_Type, &var_)) {
return 0;
}
else {
NPY_BEGIN_ALLOW_THREADS;
X = (double*)X_->data;
dm = (double*)dm_->data;
var = (double*)var_->data;
m = X_->dimensions[0];
n = X_->dimensions[1];
pdist_seuclidean(X, var, dm, m, n);
NPY_END_ALLOW_THREADS;
}
return Py_BuildValue("d", 0.0);
}
Upvotes: 0
Views: 694
Reputation: 4313
Example code that follow your process in the drawing with numpy:
import numpy as np
N = 3
shape = (N, 2)
X = np.random.random(shape) * 10
print("X:\n{}\n\n".format(X))
X_br = np.broadcast_to(X, (N, N, 2))
X_bc = np.repeat(X[:, np.newaxis], N, axis=1)
print("X broadcast to row:\n{}\n\nX broadcast to column:\n{}\n\n".format(X_br, X_bc))
sub_matrix = X_bc - X_br
distance_matrix = np.sqrt(np.sum(sub_matrix**2, axis = -1))
print("X_bc - X_br:\n{}\n\nAll distance:\n{}".format(sub_matrix, distance_matrix))
Outputs:
X:
[[2.05479961 2.74672028]
[2.10302902 5.36635678]
[4.82345137 9.0050768 ]]
X broadcast to row:
[[[2.05479961 2.74672028]
[2.10302902 5.36635678]
[4.82345137 9.0050768 ]]
[[2.05479961 2.74672028]
[2.10302902 5.36635678]
[4.82345137 9.0050768 ]]
[[2.05479961 2.74672028]
[2.10302902 5.36635678]
[4.82345137 9.0050768 ]]]
X broadcast to column:
[[[2.05479961 2.74672028]
[2.05479961 2.74672028]
[2.05479961 2.74672028]]
[[2.10302902 5.36635678]
[2.10302902 5.36635678]
[2.10302902 5.36635678]]
[[4.82345137 9.0050768 ]
[4.82345137 9.0050768 ]
[4.82345137 9.0050768 ]]]
X_bc - X_br:
[[[ 0. 0. ]
[-0.04822941 -2.6196365 ]
[-2.76865177 -6.25835652]]
[[ 0.04822941 2.6196365 ]
[ 0. 0. ]
[-2.72042236 -3.63872002]]
[[ 2.76865177 6.25835652]
[ 2.72042236 3.63872002]
[ 0. 0. ]]]
All distance:
[[0. 2.62008043 6.8434245 ]
[2.62008043 0. 4.54323466]
[6.8434245 4.54323466 0. ]]
Upvotes: 0
Reputation: 66
shape = (10,2)
X = np.random.random(shape)
# X[:,0] --> x values
# X[:,1] --> y values
dist = np.sqrt(np.sum((X[:,np.newaxis,:] - X[np.newaxis,:,:]) ** 2, axis = -1))
Upvotes: 1