Reputation: 746
I've created two clustering algorithms: k-means and divisive, maybe later I'll add aglomerative as well. I have to analyze how good they are with high dimension data, and for that I have to calculate the average/sum distance to the clusters center. In the case of k-means, it's easy, i have the centroid, but how to find the center in the divisive/aglomerative algorithm? While I'm here: I've currently implemented Euclede's, Manhattans and Pearsons distance, are there any more distance measures which i could use? Thanks in advance!
Upvotes: 0
Views: 304
Reputation: 746
The goal of my work is to analyze these clusters, when they have to create clusters from data with high dimensionality. It is hard to evaluate them and it's very unlikely that the result will be completely fair, so I'm going to use the average, accumulated distance between records in one cluster and the minimal distance between two records from different clusters. Regarding the way on how to find the center of a cluster in Hierarchical clustering algorithms - the same formula used in k-means, used to recalculate the centroid after each iteration.
Upvotes: 0
Reputation: 77474
You may want to get this book:
which covers many of the alternate distance functions you could use.
Probably a few hundred different distances ...
However, you will also need to look into your evaluation method -- if it is centroid based, it will be biased towards k-means. So the comparison is likely unfair.
Furthermore, if you use artificial data, make sure you do not unfairly favor one method over another because the method correlates with the way you generate your data (e.g. if you generate Gaussian clusters, it favors methods such as k-means).
Upvotes: 1