Yeshi
Yeshi

Reputation: 11

Comparing k-means clustering

I have 150 images, 15 each of 10 different people. So basically I know which image should belong together, if clustered.

These images are of 73 dimensions (feature-vector) and I clustered them into 10 clusters using kmeans function in matlab.

Later, I processed these 150 data points and reduced its dimension from 73 to 3 for my work and applied the same kmeans function on them.

I want to compare the results obtained on these data sets (processed and unprocessed) by applying the same k-means function and wish to know if the processing which reduced it to lower dimension improves the kmeans clustering or not.

I thought comparing the variance of each cluster can be one parameter for comparison, however I am not sure if I can directly compare and evaluate my results (within cluster sum of distances etc.) as both the cases are of different dimension. Could anyone please suggest a way where I can compare the kmean results, some way to normalize them or any other comparison that I can make?

Upvotes: 1

Views: 1461

Answers (1)

John
John

Reputation: 5905

I can think of three options. I am unaware of any well developed methodology to do this specifically with K-means clustering.

  1. Look at the confusion matrices between the two approaches.
  2. Compare the mahalanobis distances between the clusters, and between items in clusters to their nearest other clusters.
  3. Look at the Vornoi cells and see how far your points are from the boundaries of the cells.

The problem with 3, is the distance metrics get skewed, 3D distance vs. 73D distances are not commensurate, so I'm not a fan of that approach. I'd recommend reading some books on K-means if you are adamant of that path, rank speculation is fun, but standing on the shoulders of giants is better.

Upvotes: 1

Related Questions