Mahalanabois distance in python returns matrix instead of distance

Question

This should be a simple question, either I am missing information, or I have mis-coded this.

I am trying to implement Mahalanabois distance in python which I am following from the formula in python.

My code is as follows:

a = np.array([[1, 3, 5]])
b = np.array([[4, 5, 6]])

X = np.empty((0,3), float)
X = np.vstack([X, [2,3,4]])
X = np.vstack([X, a])
X = np.vstack([X, b])

n = ((a-b).T)*(np.cov(X)**-1)*(a-b)
dist = np.sqrt(n)

dist returns a 3x3 array but should I not be expecting a single number representing the distance?

dist = array([[ 1.5       ,  1.73205081,  1.22474487],
       [ 1.73205081       ,  2.        ,  1.41421356],
       [ 1.22474487       ,  1.41421356,  1.        ]])

Wikipedia does not suggest (to me) that it should return a matrix. Googling implementations of mahalanbois distance in python I have not found something to compare it to.

Anton Protopopov · Accepted Answer

From wiki page you could see, that a and b are vectors but in your case they are arrays. So you need reverse transposing. And also there should be matrix multiplication. In numpy * means element-wise multiplication, for matrix you should use np.dot function or .dot method of the np.array. For your case answer is:

n = (a-b).dot((np.cov(X)**-1).dot((a-b).T))
dist = np.sqrt(n)

In [54]: n
Out[54]: array([[ 25.]])

In [55]: dist
Out[55]: array([[ 5.]])

EDIT

As @roadrunner66 noticed you should use inverse matrix instead of inverse matrix of element. Usually np.linalg.inv works for that cases but for that you've got Singular Error and you need to use np.linalg.pinv:

n = (a-b).dot((np.linalg.pinv(np.cov(X))).dot((a-b).T))
dist = np.sqrt(n)

In [90]: n
Out[90]: array([[ 1.77777778]])

In [91]: dist
Out[91]: array([[ 1.33333333]])

Mahalanabois distance in python returns matrix instead of distance

Answers (1)

Related Questions