Reputation: 61355
I have this sample array:
In [38]: arr
Out[38]: array([ 0, 44, 121, 154, 191])
The above is just a sample whereas my actual array size is pretty huge. So, what is an efficient way to compute a distance matrix?
The result should be:
In [41]: res
Out[41]:
array([[ 0, 44, 121, 154, 191],
[ -44, 0, 77, 110, 147],
[-121, -77, 0, 33, 70],
[-154, -110, -33, 0, 37],
[-191, -147, -70, -37, 0]])
I wrote a for
loop based implementation which is too slow. Could this be vectorized for efficiency reasons?
Upvotes: 1
Views: 350
Reputation: 71
def dist_calc(
arr1: np.ndarray[tuple[int, int], np.dtype[int | float]],
arr2: np.ndarray[tuple[int, int], np.dtype[int | float]],
) -> np.ndarray[tuple[int, int], np.dtype[float]]:
"""
calculates euclidean distances between all points in two k-dimensional arrays
'arr1' and 'arr2'
:parameter
- arr1: N x k array
- arr2: M x k array
:return
- dist: M x N array with pairwise distances
"""
norm_1 = np.sum(arr1 * arr1, axis=1).reshape(1, -1)
norm_2 = np.sum(arr2 * arr2, axis=1).reshape(-1, 1)
dist = (norm_1 + norm_2) - 2.0 * np.dot(arr2, arr1.T)
# necessary due to limited numerical accuracy
dist[dist < 1.0e-11] = 0.0
return np.sqrt(dist)
res = dist_calc(arr, arr)
this would be a more general answer where the arrays don't have to be one dimensional.
Upvotes: 1
Reputation: 402483
There's subtract
.outer
, which effectively performs broadcasted subtraction between two arrays.
Apply the ufunc
op
to all pairs (a, b) with a in A and b in B.Let M = A.ndim, N = B.ndim. Then the result, C, of
op.outer(A, B)
is an array of dimension M + N such that:C[i_0, ..., i_{M-1}, j_0, ..., j_{N-1}] = op(A[i_0, ..., i_{M-1}],B[j_0, ..., j_{N-1}])
np.subtract.outer(arr, arr).T
Or,
arr - arr[:, None] # essentially the same thing as above
array([[ 0, 44, 121, 154, 191],
[ -44, 0, 77, 110, 147],
[-121, -77, 0, 33, 70],
[-154, -110, -33, 0, 37],
[-191, -147, -70, -37, 0]])
Upvotes: 1
Reputation: 25813
You can use broadcasting:
from numpy import array
arr = array([ 0, 44, 121, 154, 191])
arrM = arr.reshape(1, len(arr))
res = arrM - arrM.T
Upvotes: 2