Syafiq Kamarul Azman
Syafiq Kamarul Azman

Reputation: 862

Vectorized matrix manhattan distance in numpy

I'm trying to implement an efficient vectorized numpy to make a Manhattan distance matrix. I'm familiar with the construct used to create an efficient Euclidean distance matrix using dot products as follows:

A = [[1, 2]   
     [2, 1]]

B = [[1, 1],
     [2, 2],
     [1, 3],
     [1, 4]]

def euclidean_distmtx(X, X):
    f = -2 * np.dot(X, Y.T)
    xsq = np.power(X, 2).sum(axis=1).reshape((-1, 1))
    ysq = np.power(Y, 2).sum(axis=1)
    return np.sqrt(xsq + f + ysq)

I want to implement somthing similar but using Manhattan distance instead. So far I've got close but fell short trying to rearrange the absolute differences. As I understand it, the Manhattan distance is

\sum_i |x_i - y_i| = |x_1 - y_1| + |x_2 - y_2| + ...

I tried to solve this by considering if the absolute function didn't apply at all giving me this equivalence

\sum_i x_i - y_i = \sum_i x_i - \sum_i y_i

which gives me the following vectorization

def manhattan_distmtx(X, Y):
    f = np.dot(X.sum(axis=1).reshape(-1, 1), Y.sum(axis=1).reshape(-1, 1).T)
    return f / Y.sum(axis=1) - Y.sum(axis=1)

I think I'm the right track but I just can't move the values around without removing that absolute function around the difference between each vector elements. I'm sure there's a clever trick around the absolute values, possibly by using np.sqrt of a squared value or something but I can't seem to realize it.

Upvotes: 15

Views: 30735

Answers (1)

Divakar
Divakar

Reputation: 221574

I don't think we can leverage BLAS based matrix-multiplication here, as there's no element-wise multiplication involved here. But, we have few alternatives.

Approach #1

We can use Scipy's cdist that features the Manhattan distance with its optional metric argument set as 'cityblock' -

from scipy.spatial.distance import cdist

out = cdist(A, B, metric='cityblock')

Approach #2 - A

We can also leverage broadcasting, but with more memory requirements -

np.abs(A[:,None] - B).sum(-1)

Approach #2 - B

That could be re-written to use less memory with slicing and summations for input arrays with two cols -

np.abs(A[:,0,None] - B[:,0]) + np.abs(A[:,1,None] - B[:,1])

Approach #2 - C

Porting over the broadcasting version to make use of faster absolute computation with numexpr module -

import numexpr as ne
A3D = A[:,None]
out = ne.evaluate('sum(abs(A3D-B),2)')

Upvotes: 22

Related Questions