YuNo
YuNo

Reputation: 141

how to do clustering with similarity as a measure?

I read about spherical kmeans but i did not come across an implementation.To be clear, similarity is simple the dot product of two document unit vectors.I have read that standard k means uses distance as measure. Is the distance being specified the vector distance just like in coordinate geometry sqrt((x2 -x1)^2 + (y2-y1)^2)?

Upvotes: 1

Views: 517

Answers (1)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

There are more clustering methods than k-means. The problem with k-means is not so much that is is built on Euclidean distance, but that the mean must reduce the distances for the algorithm to converge.

However, there are tons of other clustering algorithms that do not need to compute a mean or have triangle inequality. If you read the Wikipedia article on DBSCAN, it also mentions a version called GDBSCAN, Generalized DBSCAN. You definitely should be able to plug your similarity function into GDBSCAN. Most likely, you could just use 1/similarity and use it as a distance function, unless the algorithm requires triangle inequality. So this trick should work with DBSCAN and OPTICS, for example. Probably also with hierarchical clustering, k-medians and k-medoids (PAM).

Upvotes: 1

Related Questions