Computing Mahalanobis Distance Component Wise

Question

I have 60000 vectors of 784 dimensions. This data has 10 classes.

I must evaluate a function that takes out one dimension and computes the distance metric again. This function is computing the distance of each vector to it's classes' mean. In code:

def objectiveFunc(self, X, y, indices):

    subX = np.array([X[:,i] for i in indices]).T
    d = np.zeros((10,1))
    for n in range(10):
        C = subX[np.where(y == n)]
        u = np.mean(C, axis = 0)
        Sinv = pinv(covariance(C))
        d[n] = np.mean(np.apply_along_axis(mahalanobis, axis = 1, arr=C, v=u, VI=Sinv))

where indices are fed in with one index removed during each iteration.

As you can imagine, I am computing a lot of individual components during the computation for Mahalanobis distance. Is there a way for me to store all the 784 component distances?

Alternatively, what's the fastest way to compute Mahalanobis distance?

razimbres · Accepted Answer

First of all and to make it easier to understand, this is the Mahalanobis Distance formula:

So, to compute the mahalanobis distance for each element according to its class, we can do:

X_train=X_train.reshape(-1,784)

def mahalanobis(element,classe):
    part=np.where(y_train==classe)[0]
    ave=np.mean(X_train[part])
    distance_example=np.sqrt(((np.mean(X_train[part[[element]]])-ave)**2)/np.var(X_train[part]))
    return distance_example

mahalanobis(20,2)    
# Out[91]: 0.13947337027828757

Then you can create a for statement to calculate all distances. For instance, class 0:

[mahalanobis(i,0) for i in range(0,len(X_train[np.where(y_train==0)[0]]))]

Computing Mahalanobis Distance Component Wise

Answers (1)

Related Questions