Picormorant
Picormorant

Reputation: 1

Trouble understanding Closest-Pair divide and conquer algorithm

I'm new into coding and today I completed the trivial solution for the Closest-Pair problem in a 2-D space. (2 for loops)

However I gave up finding any solution which could do it in O(n log n). Even after researching it, I still don't understand how this can be faster than the trivial method.

What I understand: -> At first we split the array in 2 halfs and sort everything only considering the X coordinates. This can be done in n log n.

Next there are recursive calls which "find the two points with the lowest distance" in each half. But how is this done exactly below O(n^2)? In my understanding it is impossible to find the lowest distance between N/2 points without checking every single one of them.

There is a solution in 1-D which absolutely makes sense to me. After sorting we know, that the distance between two non-adjacent points can't be lower than the distance of at least 2 adjacent ones. However this is not true for 2-D space, since we have an additional Y coordinate which could lead to the lowest distance between two points which are not adjacent on the X axis.

Upvotes: 0

Views: 634

Answers (1)

collapsar
collapsar

Reputation: 17238

First of all, heed the advice of user @Evg - this answer cannot substitute the comprehensive description and mathematically rigorous analysis of the algorithm.

However, here are some ideas to get the intuition started:

  1. (Recursion structure) The question states:

    Next there are recursive calls which "find the two points with the lowest distance" in each half. But how is this done exactly below O(n^2)? In my understanding it is impossible to find the lowest distance between N/2 points without checking every single one of them.

    The recursion, however, does not stop at level 1 - assume for the sake of the argument that some O(n log n) algorithm works. Finding closest pairs among N/2 points applying that very algorithm takes O(N/2 log N/2) - not O((N/2)^2).

  2. (Consequences of finding a closest pair in one half)
    If you have found a closest pair (p, q) in the 'left' half of the point set, this pair's distance sets an upper bound to the width of a corridor around the halving line from which a closer pair (r, s) with r from the left, s from the right half can be drawn. If the closest distance found so far is 'small', it significantly reduces the size of the candidate set. As the points have been ordered by their x coordinate, the algorithm can exploit the information efficiently. Said corridor may still cover up to the whole set of N points, but if it does, it provides information of the geometry of the point set: the points of each half will basically be aligned along a vertical line. This information can be exploited algorithmically - the most naive way would be to execute the algorithm once again but sorting along y coordinates and halving the point set by a horizontal line. Note that executing any algorithm a constant number of times does not change asymptotic run time expressed by the O(.) notation.

  3. (Finding a close pair with one point from each half)
    Consider checking a pair of points (r, s), one point from each half. It is known that the difference in their x and y coordinates, resp., mustn't exceed the minimal distance d found so far. It is known from the recursion that there can be no points r', s' (r' from the left, s' from the right half) closer to r, s, resp., than d. So given some r there cannot be 'many' candidates from the other half.
    Imagine a circle of radius d drawn around r. Any point s from the other half being closer than d must be located within that circle. Let there be a few of them - however, the minimum distance among each pair still be at least d. The maximum number of points that can be distributed within a circle of radius d such that the distance between each pair of them is at least d is 7 - think of a regular hexagon with side length d and its center coinciding with the circle's center. So after the recursion, at most every r from the left half needs to be checked against at max a constant number of points from the other half which makes the part of the algorithm after the recursion run in O(N).
    Note that finding the pairing candidates for a given r is an efficient operation - the points from both halves have been sorted by the same criterion.

Upvotes: 1

Related Questions