Reputation: 5097
The standard sweep line algorithm for closest pairs is well known, as described here, which uses a sweep line to sweep horizontally across the set of points, maintaining only those within the current best distance of the current point.
Normally, the points must initially be sorted by x-coordinate, and the bounding box (std::set in the case of a c++ implementation) must be sorted by y-coordinate, as can be seen in this c++ implementation.
However, when trying to implementing I accidentally forgot to sort the points by x-coordinate and instead sorted them by y-coordinate. Surprisingly, this still seemed to work. You can see my implementation here, which essentially follows a slightly modified version of the standard linesweep closest pairs algorithm:
#include <iostream>
#include <set>
#include <algorithm>
#include <math.h>
#include <vector>
using namespace std;
#define x second
#define y first
typedef pair<long long, long long> pll;
inline double dist(pll p1, pll p2)
{
return sqrt((double) (p2.y - p1.y)*(p2.y - p1.y) + (p2.x - p1.x)*(p2.x - p1.x));
}
int main(int argc, const char * argv[])
{
int numPoints;
cin >> numPoints;
vector <pll> points;
points.resize(numPoints);
for (int i = 0; i < numPoints; i++)
{
cin >> points[i].x >> points[i].y;
}
//Sorts the points by y coordinate (because y is first)
sort(points.begin(), points.end());
double shortestDistSoFar = INFINITY;
set <pll> boundingBox; //Bounding box maintained by y-coordinate
boundingBox.insert(points[0]);
int left = 0;
pll best1, best2;
for (int i = 1; i < numPoints; i++)
{
//Maintain only points to the left of the current point whose distance is less than bestDist
while ((left < i) && (points[i].x - points[left].x > shortestDistSoFar))
{
boundingBox.erase(points[left]);
left++;
}
//Consider only points within bestDist of the current point
for (auto it = boundingBox.lower_bound(pll(points[i].y - shortestDistSoFar, points[i].x - shortestDistSoFar));
it != boundingBox.end() && it->y <= points[i].y + shortestDistSoFar; it++)
{
if (dist(*it, points[i]) < shortestDistSoFar)
{
shortestDistSoFar = dist(*it, points[i]);
best1 = *it;
best2 = points[i];
}
}
boundingBox.insert(points[i]);
}
return 0;
}
Visit each point in order of increasing y-coordinate and for each point check all points from y-bestDist to y+bestDist, updating bestDist when a new shortest distance is found and removing points from the set whose x-coordinate distance from the current point is greater than bestDist.
Does this modified algorithm still work (I only tested for a few cases), and is the running time still O(N lgN)?
Upvotes: 4
Views: 668
Reputation: 18556
It does work correctly(because the points are deleted from the set only when they can be safely deleted). However, it has O(n ^ 2)
time complexity, because the points are not always deleted when they should be.
This simple generator(written in python3):
from sys import argv
n = int(argv[1])
dy = 1
dx = -n
print(n)
for i in range(n):
print(dx * i, dy * i)
creates a test case that makes your code perform O(n ^ 2)
operations for any n
.
The idea of this test case is very simple: if the points (after sorting by y coordinate) are iterated in a decreasing order of x-coordinate, they are never removed from the set. So if they are located close to each other by y coordinate and far by x coordinate, the entire set is traversed every time. That's why all pairs of points are checked(and there are exactly n * (n - 1) / 2
pairs).
Now let's see how it works in practice:
Firstly, I compiled your code with g++ -std=c++11 -O2 closest_pair.cpp
.
After that I ran a bunch of tests:
temp$ python3 gen.py 10000 > input
temp$ time ./a.out < input
real 0m0.805s
user 0m0.797s
sys 0m0.008s
temp$ python3 gen.py 30000 > input
temp$ time ./a.out < input
real 0m7.195s
user 0m7.198s
sys 0m0.004s
temp$ python3 gen.py 50000 > input
temp$ time ./a.out < input
real 0m23.711s
user 0m23.725s
sys 0m0.004s
As you can see, it works really slowly.
Upvotes: 4