Reputation: 8109
What is a good clustering algorithm that simply puts two data items into the same cluster if their separation is less than some user specified cutoff?
i.e. the result of X_clustering(data, distance, epsilon)
is a set of cluster assignments such that for any pair i,j
, they are in the same cluster if distance(data[i], data[j]) < epsilon
. If distance(data[i], data[j]) >= epsilon
they can be in different clusters (if there aren't other data that end up linking them...).
Another way of stating it is: i,j
are in the same cluster if there exists a path [i, x, y, z..., j]
through the data such that each step is of distance<epsilon
, and they are in different clusters if no such path exists.
Upvotes: 1
Views: 51
Reputation: 18858
Your idea is not working. If for all pairs of (data[i], data[j])
in a cluster, their distance is less than a given epsilon
, it means all members of that cluster are located in a circle with the radius of epsilon
. Hence, this clustering method cannot be generalized.
By the way, DBSCAN is a good clustering algorithm in case of determining clusters base on their density with a given epsilon
. You can modify this algorithm by adding a stronger constraint such that:
each data can be added to a cluster if its distance to all members of the cluster is less than a given
epsilon
.
Upvotes: 1