Reputation: 4960
I have a collection of n point sets, where each point set contains up to m points.
I want to select one point from each point set, such that the resulting selection of points is as close together as possible. (where "closeness" has a reasonable definition, like the sum of squared distances from the centroid of the selected set of points.)
For example, the input collection could be:
Point Set A: [(2, 1), (1, 2), (6, 5)]
Point Set B: [(1, 1), (7, 3)]
Point Set C: [(3, 7), (5, 3)]
I want to select the three points, one from each point set, where the points are closest together. In this example, the three points at the bottom-left are closest together, but they do not include a point from C. The solution here would be the points on the right: (6, 5), (7, 3), and (5, 3). These are clustered around their centroid at (6, 3⅔).
The brute-force algorithm tries every possible combination of points from the collection and keeps track of the minimum value of the "closeness" function (i.e., an O(m^n) algorithm), but I'm wondering if there's a more efficient way that scales for large values of n and m -- if not in the worst case, then at least for most inputs.
Update: The points will have real values as coordinates; integers are used above to simplify the example.
Upvotes: 3
Views: 524
Reputation: 563
It can be seen as a combinatorial optimization problem. One way to solve it is to build a tree and make a DFS inspection of each branch of the tree (it is called Branch and Bound), by keeping the current best set of points. Here's an illustration of your example :
You go down the leftmost branch and find a first result. Then each time you go down a branch, if at some point the distance you are calculating is superior to your actual result, you can stop the exploration of the branch - you won't get a better result by going down.
It may not be relevant for few points, but if you have 10 sets of 10 points, it is a good method. A simple way to accelerate the process is to put the smallest set on the top of the tree (less nodes and branches).
Obviously, in the worst case the rightmost branch is the best one. But it's very improbable that each successive branch will be better than the previous one, so you should still gain some time.
Note : don't forget to store the distance between two points when you calculated it, so you don't have to redo the calculus later.
Upvotes: 1
Reputation: 6274
You can use a Delaunay triangulation of the point. The graph of that triangulation encodes the proximity information: each vertex is connected to its nearest neighbor by an edge of the Delaunay triangulation. You can use that Delaunay triangulation, and a union-find, to create your point sets.
Upvotes: 1
Reputation: 2357
First of all we can consider an improvement to the brute force algorithm.
O(m^n) is a huge number! How can we enhance this search? You're not searching a global coverage of the sets that guarantee a minimum. Furthermore a point for each set is needed to be in the solution. You're new brute force algorithm will look something like this:
For eache point p in S0 find the nearest point to p in S1...S(n-1)
Computational complexity for this algo is O(mnm)
Can we improve furthermore our algo? Yes.
We can adopt a Kd-Tree to speed up neighbour search. Basically you need to build n Kd-Tree in O(mlog(m)) and use them to reduce your complexity in the average case to O(mn*log(m))
Will this algorithm always find the minimum? No
Take a look ad this example:
As you can see the inter-cluster distance obtained with the previous algorithm is not optimal, it's just a nearest neighbour heuristic. The good news is that you will be near the solution. You can adopt an Random-restart hill climbing algorithm to find the global minimum
Upvotes: 3