Reputation: 4813
I've done Kmeans clustering in OpenCV using C++ and have 12 cluster centers (each in 200 dimensions).
Now, I have a set of points in 200 dimensions and I'm trying to find the closest cluster (Vector Quantization).
Which distance is preferred over the other (Mahalanobis distance or Euclidean distance) ? Currently I'm using Euclidean distance.
Upvotes: 3
Views: 1692
Reputation: 7138
Andrey's point is a valid one. I can add a general statement:
For Mahalanobis distance you need to be able to properly estimate the covariance matrix for each cluster. With 200 dimensions the only way you can expect a reasonable estimate for the covariance matrix cluster is with something in the order of several hundreds to thousands of datapoints. Add to that the 12 clusters you have and you easily need tens of thousands of datapoints to reasonably use Mahalanobis distance.
Apart from that: try how Euclidean distance works for you. If results are reasonable, just stick to that, otherwise try Mahalanobis.
Finally, you might find more knowledgeable people on this subject on the stats stackexchange.
Upvotes: 4
Reputation: 20915
That is impossible to answer without knowing the context. There is no such thing as good or bad metric, each one is more suited to a specific class of problems.
Upvotes: 4