Reputation: 862
I'm trying to implement an efficient vectorized numpy
to make a Manhattan distance matrix. I'm familiar with the construct used to create an efficient Euclidean distance matrix using dot products as follows:
A = [[1, 2]
[2, 1]]
B = [[1, 1],
[2, 2],
[1, 3],
[1, 4]]
def euclidean_distmtx(X, X):
f = -2 * np.dot(X, Y.T)
xsq = np.power(X, 2).sum(axis=1).reshape((-1, 1))
ysq = np.power(Y, 2).sum(axis=1)
return np.sqrt(xsq + f + ysq)
I want to implement somthing similar but using Manhattan distance instead. So far I've got close but fell short trying to rearrange the absolute differences. As I understand it, the Manhattan distance is
I tried to solve this by considering if the absolute function didn't apply at all giving me this equivalence
which gives me the following vectorization
def manhattan_distmtx(X, Y):
f = np.dot(X.sum(axis=1).reshape(-1, 1), Y.sum(axis=1).reshape(-1, 1).T)
return f / Y.sum(axis=1) - Y.sum(axis=1)
I think I'm the right track but I just can't move the values around without removing that absolute function around the difference between each vector elements. I'm sure there's a clever trick around the absolute values, possibly by using np.sqrt
of a squared value or something but I can't seem to realize it.
Upvotes: 15
Views: 30735
Reputation: 221574
I don't think we can leverage BLAS based matrix-multiplication here, as there's no element-wise multiplication involved here. But, we have few alternatives.
Approach #1
We can use Scipy's cdist
that features the Manhattan distance with its optional metric argument set as 'cityblock'
-
from scipy.spatial.distance import cdist
out = cdist(A, B, metric='cityblock')
Approach #2 - A
We can also leverage broadcasting
, but with more memory requirements -
np.abs(A[:,None] - B).sum(-1)
Approach #2 - B
That could be re-written to use less memory with slicing and summations for input arrays with two cols -
np.abs(A[:,0,None] - B[:,0]) + np.abs(A[:,1,None] - B[:,1])
Approach #2 - C
Porting over the broadcasting
version to make use of faster absolute
computation with numexpr
module -
import numexpr as ne
A3D = A[:,None]
out = ne.evaluate('sum(abs(A3D-B),2)')
Upvotes: 22