Reputation: 39
I am trying to find the minimal distance in the Manhattan metric (x,y). I am searching for information about this. But I haven't found anything.
#include<bits/stdc++.h>
using namespace std;
#define st first
#define nd second
pair<int, int> pointsA[1000001];
pair<int, int> pointsB[1000001];
int main() {
int n, t;
unsigned long long dist;
scanf("%d", &t);
while(t-->0) {
dist = 4000000000LL;
scanf("%d", &n);
for(int i = 0; i < n; i++) {
scanf("%d%d", &pointsA[i].st, &pointsA[i].nd);
}
for(int i = 0; i < n; i++) {
scanf("%d%d", &pointsB[i].st, &pointsB[i].nd);
}
for(int i = 0; i < n ;i++) {
for(int j = 0; j < n ; j++) {
if(abs(pointsA[i].st - pointsB[j].st) + abs(pointsA[i].nd - pointsB[j].nd) < dist) {
dist = abs(pointsA[i].st - pointsB[j].st) + abs(pointsA[i].nd - pointsB[j].nd);
}
}
printf("%lld\n", dist);
}
}
}
My code works in O(n^2) but is too slow. I do not know whether it will be useful but y in pointsA always be > 0 and y in pointsB always be < 0. My code compare actually distance to next and chose smallest.
for example:
input:
2
3
-2 2
1 3
3 1
0 -1
-1 -2
1 -2
1
1 1
-1 -1
Output:
5
4
Upvotes: 2
Views: 1544
Reputation: 6404
Keep a list of candidates in group A and group B, initially containing the whole input. Take min y of A and max y B for the closest pair in y, calculate the Manhattan distance, and eliminate any with y greater than the upper bound from the candidate lists. This might slash the input or it might have essentially no effect, but it's O(N) and a cheap first step.
Now sort the remaining candidates in x and y. This gives you a separated list in y ,and a mixed list in x, and is O(N log N), where N has been cut down, hopefully but not necessarily, by step one. For each point, now calculate its closest neighbour in y (trivial) and closest in x (a bit harder), then calculate its minimum possible Manhattan distance, assuming the closest in x is also the closest in y. Eliminate any points further than your bound from the candidate list. Now sort again, by minimum possible. That's another N log N operation. Now start with your best candidate and find its genuine minimum distance, by trying the closest point in either direction in x or y, and terminating when either delta x or delta y goes above the best so far, or delta x or delta y goes above your maximum bound. If you have better candidate pair then the current candidate pair, purge the candidate list of everything with a worse minimum possible. If the best candidate point doesn't form half of a candidate pair, you just purge that one point.
When you've purged a certain number of candidates, recalculate the lists. I'm not sure what the best value to use would be, certainly if you get to the worst candidate you must do that and then start again at the best. Maybe use 50%.
Eventually you are left with only one candidate pair. I'm not quite sure what the analysis is - worst case I suppose you only eliminate a few candidates on each test. But for most inputs you should get the candidate list down to a small value pretty fast.
Upvotes: 0
Reputation: 40335
So there's a really simple optimization you can make that can cut a ton of time off.
Since you state that all points in set A have y > 0, and all points in set B have y < 0, you can immediately discard all points in A whose y > mindist and all points in B whose y < -mindist so far. These points can never be closer than the current closest pair:
for(int i = 0; i < n ;i++) {
if (pointsA[i].nd > dist)
continue; // <- this is the big one
for(int j = 0; j < n ; j++) {
if (pointsB[j].nd < -dist)
continue; // <- helps although not as much
if(abs(pointsA[i].st - pointsB[j].st) + abs(pointsA[i].nd - pointsB[j].nd) < dist) {
dist = abs(pointsA[i].st - pointsB[j].st) + abs(pointsA[i].nd - pointsB[j].nd);
}
}
printf("%lld\n", dist);
}
For a test of 40000 points per set, on my machine with gcc and -O2 this reduces the time from 8.2 seconds down to roughly 0.01 seconds (and yields correct results)! (measured with QueryPerformanceCounter
on Windows).
Not too shabby.
Fwiw, computing your distance twice isn't actually that big of a deal. First of all that "second" calculation doesn't actually happen all that often, it only happens when a closer distance is found.
And secondly, for reasons I can't explain, storing it in a variable and only calculating it once actually consistently seems to add about 20% to the total run time, raising it from an average of 8.2 sec to about 10.5 seconds for the above set.
I'd say discarding points based on your assumptions about the Y values is by far the biggest bang for your buck you can get without significantly changing your algorithm.
You may be able to take advantage of that further by pre-sorting A and B in order of increasing Y values for A, and decreasing Y values for B, before finding the distances, to maximize the chance of skipping over subsequent point sets.
Upvotes: 0
Reputation: 9394
My solution (note for simplicity I do not care about overflow in manhattan_dist
and for that reason it does not work with unsigned long long
):
#include <cstdlib>
#include <cstdio>
#include <cassert>
#include <vector>
#include <limits>
#include <algorithm>
typedef std::pair<int, int> Point;
typedef std::vector<std::pair<int, int> > PointsList;
static inline bool cmp_by_x(const Point &a, const Point &b)
{
if (a.first < b.first) {
return true;
} else if (a.first > b.first) {
return false;
} else {
return a.second < b.second;
}
}
static inline bool cmp_by_y(const Point &a, const Point &b)
{
if (a.second < b.second) {
return true;
} else if (a.second > b.second) {
return false;
} else {
return a.first < b.first;
}
}
static inline unsigned manhattan_dist(const Point &a, const Point &b)
{
return std::abs(a.first - b.first) +
std::abs(a.second - b.second);
}
int main()
{
unsigned int n_iter = 0;
if (scanf("%u", &n_iter) != 1) {
std::abort();
}
for (unsigned i = 0; i < n_iter; ++i) {
unsigned int N = 0;
if (scanf("%u", &N) != 1) {
std::abort();
}
if (N == 0) {
continue;
}
PointsList pointsA(N);
for (PointsList::iterator it = pointsA.begin(), endi = pointsA.end(); it != endi; ++it) {
if (scanf("%d%d", &it->first, &it->second) != 2) {
std::abort();
}
assert(it->second > 0);
}
PointsList pointsB(N);
for (PointsList::iterator it = pointsB.begin(), endi = pointsB.end(); it != endi; ++it) {
if (scanf("%d%d", &it->first, &it->second) != 2) {
std::abort();
}
assert(it->second < 0);
}
std::sort(pointsA.begin(), pointsA.end(), cmp_by_y);
std::sort(pointsB.begin(), pointsB.end(), cmp_by_y);
const PointsList::const_iterator min_a_by_y = pointsA.begin();
const PointsList::const_iterator max_b_by_y = (pointsB.rbegin() + 1).base();
assert(*max_b_by_y == pointsB.back());
unsigned dist = manhattan_dist(*min_a_by_y, *max_b_by_y);
const unsigned diff_x = std::abs(min_a_by_y->first - max_b_by_y->first);
const unsigned best_diff_y = dist - diff_x;
const int max_y_for_a = max_b_by_y->second + dist;
const int min_y_for_b = min_a_by_y->second - dist;
PointsList::iterator it;
for (it = pointsA.begin() + 1; it != pointsA.end() && it->second <= max_y_for_a; ++it) {
}
if (it != pointsA.end()) {
pointsA.erase(it, pointsA.end());
}
PointsList::reverse_iterator rit;
for (rit = pointsB.rbegin() + 1; rit != pointsB.rend() && rit->second >= min_y_for_b; ++rit) {
}
if (rit != pointsB.rend()) {
pointsB.erase(pointsB.begin(), (rit + 1).base());
}
std::sort(pointsA.begin(), pointsA.end(), cmp_by_x);
std::sort(pointsB.begin(), pointsB.end(), cmp_by_x);
for (size_t j = 0; diff_x > 0 && j < pointsA.size(); ++j) {
const Point &cur_a_point = pointsA[j];
assert(max_y_for_a >= cur_a_point.second);
const int diff_x = dist - best_diff_y;
const int min_x = cur_a_point.first - diff_x + 1;
const int max_x = cur_a_point.first + diff_x - 1;
const Point search_term = std::make_pair(max_x, std::numeric_limits<int>::min());
PointsList::const_iterator may_be_near_it = std::lower_bound(pointsB.begin(), pointsB.end(), search_term, cmp_by_x);
for (PointsList::const_reverse_iterator rit(may_be_near_it); rit != pointsB.rend() && rit->first >= min_x; ++rit) {
const unsigned cur_dist = manhattan_dist(cur_a_point, *rit);
if (cur_dist < dist) {
dist = cur_dist;
}
}
}
printf("%u\n", dist);
}
}
Benchmark on my machine (Linux + i7 2.70 GHz + gcc -Ofast -march=native):
$ make bench
time ./test1 < data.txt > test1_res
real 0m7.846s
user 0m7.820s
sys 0m0.000s
time ./test2 < data.txt > test2_res
real 0m0.605s
user 0m0.590s
sys 0m0.010s
test1
is your variant, and test2
is mine.
Upvotes: 2
Reputation: 179819
You'll need to learn how to write functions and how to use containers. With your current coding style, it's infeasible to get a better solution.
The problem is that the better solution is a recursive method. Sort the points by X coordinate. Now recursively split the set in half and determine the closest distance within each half as well as the closest distance between a pair of points from either half.
The last part is efficient because both halves are sorted by X. Comparing the last values from the left half with the first value of the right half gives a good upper bound on the distance.
Upvotes: 1