MustSeeMelons
MustSeeMelons

Reputation: 746

Cluster analysis - finding the center of a cluster

I've created two clustering algorithms: k-means and divisive, maybe later I'll add aglomerative as well. I have to analyze how good they are with high dimension data, and for that I have to calculate the average/sum distance to the clusters center. In the case of k-means, it's easy, i have the centroid, but how to find the center in the divisive/aglomerative algorithm? While I'm here: I've currently implemented Euclede's, Manhattans and Pearsons distance, are there any more distance measures which i could use? Thanks in advance!

Upvotes: 0

Views: 304

Answers (2)

MustSeeMelons
MustSeeMelons

Reputation: 746

The goal of my work is to analyze these clusters, when they have to create clusters from data with high dimensionality. It is hard to evaluate them and it's very unlikely that the result will be completely fair, so I'm going to use the average, accumulated distance between records in one cluster and the minimal distance between two records from different clusters. Regarding the way on how to find the center of a cluster in Hierarchical clustering algorithms - the same formula used in k-means, used to recalculate the centroid after each iteration.

Upvotes: 0

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77474

You may want to get this book:

  • Encyclopedia of distances, Michel Deza, Elena Deza, 590 pages.

which covers many of the alternate distance functions you could use.

Probably a few hundred different distances ...

However, you will also need to look into your evaluation method -- if it is centroid based, it will be biased towards k-means. So the comparison is likely unfair.

Furthermore, if you use artificial data, make sure you do not unfairly favor one method over another because the method correlates with the way you generate your data (e.g. if you generate Gaussian clusters, it favors methods such as k-means).

Upvotes: 1

Related Questions