Calculate Mahalanobis distance using NumPy only

Question

I am looking for NumPy way of calculating Mahalanobis distance between two numpy arrays (x and y). The following code can correctly calculate the same using cdist function of Scipy. Since this function calculates unnecessary matix in my case, I want more straight way of calculating it using NumPy only.

import numpy as np
from scipy.spatial.distance import cdist

x = np.array([[[1,2,3,4,5],
               [5,6,7,8,5],
               [5,6,7,8,5]],
              [[11,22,23,24,5],
               [25,26,27,28,5],
               [5,6,7,8,5]]])
i,j,k = x.shape

xx = x.reshape(i,j*k).T


y = np.array([[[31,32,33,34,5],
               [35,36,37,38,5],
               [5,6,7,8,5]],
              [[41,42,43,44,5],
               [45,46,47,48,5],
               [5,6,7,8,5]]])


yy = y.reshape(i,j*k).T

results =  cdist(xx,yy,'mahalanobis')
results = np.diag(results)
print results



[ 2.28765854  2.75165028  2.75165028  2.75165028  0.          2.75165028
  2.75165028  2.75165028  2.75165028  0.          0.          0.          0.
  0.          0.        ]

My trial:

VI = np.linalg.inv(np.cov(xx,yy))

print np.sqrt(np.dot(np.dot((xx-yy),VI),(xx-yy).T))

Could anybody correct this method?

Here is formula for it:

http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.mahalanobis.html#scipy.spatial.distance.mahalanobis

xnx · Accepted Answer

I think your problem lies in the construction of your covariance matrix. Try:

X = np.vstack([xx,yy])
V = np.cov(X.T)
VI = np.linalg.inv(V)
print np.diag(np.sqrt(np.dot(np.dot((xx-yy),VI),(xx-yy).T)))

Output:

[ 2.28765854  2.75165028  2.75165028  2.75165028  0.          2.75165028
  2.75165028  2.75165028  2.75165028  0.          0.          0.          0.
  0.          0.        ]

To do this without the intermediate array implicitly created here, you might have to sacrifice a C loop for a Python one:

A = np.dot((xx-yy),VI)
B = (xx-yy).T
n = A.shape[0]
D = np.empty(n)
for i in range(n):
    D[i] = np.sqrt(np.sum(A[i] * B[:,i]))

EDIT: actually, with np.einsum voodoo you can remove the Python loop and speed it up a lot (on my system, from 84.3 µs to 2.9 µs):

D = np.sqrt(np.einsum('ij,ji->i', A, B))

EDIT: As @Warren Weckesser points out, einsum can be used to do away with the intermediate A and B arrays too:

delta = xx - yy
D = np.sqrt(np.einsum('nj,jk,nk->n', delta, VI, delta))

Calculate Mahalanobis distance using NumPy only

Answers (2)

Related Questions