Reputation: 123
I have a set of objects. Each object is placed in the "space" and I know the distance between each object. I am looking for an algorithm for grouping objects far from each other. I choose the number of groups. And groups should be "balanced" (every group should contain the same number of items).
Example:
Assume I have 4 objects
{ A, B, C, D }
and i represent them in a two-dimension space:
I know the distance between each object so
{
AB = 1
AC = 3.6
AD = 5
BC = 2.8
BD = 4.2
and so on...
}
I want the algorithm to group objects in two groups and it should output
{[ A, C ][ B, D ]}
Of course this is easy with 4 objects, but it's hard with more items.
I've searched a lot, but i couldn't find anything for such a grouping. I've read a lot about k-means clustering and other clustering methods, but they don't suit because they group similar objects.
What is the best solution?
EDIT
The formalization of the problem could be the maximization of the distance between elements in each group. So that's why the algorithm should group A and C, B and D.
A and D, B and C it's not a good solution.
The algorithm should deal with N items (N > 2) and K groups (K < N, i choose how many groups)
Upvotes: 1
Views: 290
Reputation: 123
My problem was not the algorithm. The problem was formalizing the problem itself. And Gene helped me with his comment
You need to formalize the definition of a "good" solution. I suggest maximum sum of squares of pairwise distances. I'm confident the problem of maximizing this quantity is NP hard. The trivial solution of repeatedly picking and removing the "farthest pair" is easy. So now you are down to some kind of search. – Gene Jul 6 at 15:36
Thanks
Upvotes: 0
Reputation: 77454
You can probably abuse clustering algorithms here, by giving them a distance when they expect a similarity, or the other way round.
Have a look at hierarchical clustering, it should be easy to "break" it the way you want.
But more likely than not, the results will be not very convincing on anything but such toy scenarios, because "dissimilarity" is not transitive.
Usually, when A similar to B, and B similar to C, you will want to have A, B, and C in the same cluster.
But when A is dissimilar to B, B is dissimilar to C, then A and C could be very similar; so you don't want them to cluster... ''maybe'' complete linkage clustering (when abuse as discussed above) can work for you, though.
Upvotes: 0