kmario23
kmario23

Reputation: 61355

Efficient way to compute distance matrix in NumPy

I have this sample array:

In [38]: arr
Out[38]: array([  0,  44, 121, 154, 191])

The above is just a sample whereas my actual array size is pretty huge. So, what is an efficient way to compute a distance matrix?

The result should be:

In [41]: res
Out[41]: 
array([[   0,   44,  121,  154,  191],
       [ -44,    0,   77,  110,  147],
       [-121,  -77,    0,   33,   70],
       [-154, -110,  -33,    0,   37],
       [-191, -147,  -70,  -37,    0]])

I wrote a for loop based implementation which is too slow. Could this be vectorized for efficiency reasons?

Upvotes: 1

Views: 350

Answers (3)

user20068036
user20068036

Reputation: 71

def dist_calc(
    arr1: np.ndarray[tuple[int, int], np.dtype[int | float]],
    arr2: np.ndarray[tuple[int, int], np.dtype[int | float]],
) -> np.ndarray[tuple[int, int], np.dtype[float]]:
    """
    calculates euclidean distances between all points in two k-dimensional arrays
    'arr1' and 'arr2'
        :parameter
            - arr1: N x k array
            - arr2: M x k array
        :return
            - dist: M x N array with pairwise distances
    """
    norm_1 = np.sum(arr1 * arr1, axis=1).reshape(1, -1)
    norm_2 = np.sum(arr2 * arr2, axis=1).reshape(-1, 1)

    dist = (norm_1 + norm_2) - 2.0 * np.dot(arr2, arr1.T)
    # necessary due to limited numerical accuracy
    dist[dist < 1.0e-11] = 0.0

    return np.sqrt(dist)

res = dist_calc(arr, arr)

this would be a more general answer where the arrays don't have to be one dimensional.

Upvotes: 1

cs95
cs95

Reputation: 402483

There's subtract.outer, which effectively performs broadcasted subtraction between two arrays.

Apply the ufunc op to all pairs (a, b) with a in A and b in B.

Let M = A.ndim, N = B.ndim. Then the result, C, of op.outer(A, B) is an array of dimension M + N such that:

C[i_0, ..., i_{M-1}, j_0, ..., j_{N-1}] = 
     op(A[i_0, ..., i_{M-1}],B[j_0, ..., j_{N-1}])
np.subtract.outer(arr, arr).T

Or,

arr - arr[:, None] # essentially the same thing as above

array([[   0,   44,  121,  154,  191],
       [ -44,    0,   77,  110,  147],
       [-121,  -77,    0,   33,   70],
       [-154, -110,  -33,    0,   37],
       [-191, -147,  -70,  -37,    0]])

Upvotes: 1

Bi Rico
Bi Rico

Reputation: 25813

You can use broadcasting:

from numpy import array

arr = array([  0,  44, 121, 154, 191])
arrM = arr.reshape(1, len(arr))
res = arrM - arrM.T

Upvotes: 2

Related Questions