Maciek D.
Maciek D.

Reputation: 2864

Effective clustering algorithm

I need help (preferably a full algorithm, but any hint or reference will be appreciated) with the following algorithmic problem:

We have a set of N elements. I can define a distance between any two elements, which satisfies the metric conditions. I need to group these elements into disjoint subsets (each element belonging to exactly one subset) according to the following rules:

  1. The maximum distance between any two elements in each subset does not exceed specified threshold.

  2. The number of the subsets is as small as possible.

  3. If there is more than one possible grouping satisfying conditions (1) and (2), the maximum distance between any two elements in each subset should be as small as possible.

Example:

Assume we have the following points on a number axis: 1, 11, 12, 13, 23. The distance is simple the the difference between the points. Our distance threshold is 10. The two possible grouping satisfying conditions (1) and (2) are: (1, 11), (12), (13, 23) or (1), (11, 12, 13), (23). However, the condition (3) says that the latter grouping is the correct one.

Upvotes: 2

Views: 105

Answers (1)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77485

In 1 dimensional data, sort your data, and divide into the desired number of bins, then move bin boundaries to optimize.

It gets more interesting in higher dimensionality. There, the problem will be NP hard. So finding the optimum will be expensive. You can indeed use clustering here: use complete-linkage clustering. For a O(n²) and O(n) memory approach, try CLINK. But in my experience, you will need to run this algorithm several times, on shuffled data, to get a good solution.

Upvotes: 1

Related Questions