Reputation: 431
I have a list L of ~30k locations (written as longitude/latitude pairs), and a list E of ~1m events (with locations written as longitude/latitude pairs), each of which occurs at a point in L. I want to tag each event in E with its corresponding location in L. But the cooordinates in L and E are rounded differently—E to five decimal places, L to thirteen—so ostensibly identical coordinates can actually differ by ~10^-5 degrees, or ~1 meter. (Points in L are separated by at least ~10 m.)
I thus need the nearest point in L to every point of E; the obvious O(|L||E|) brute-force algorithm is too slow. L is small enough compared to E that algorithms that preprocess L and amortize the preprocessing time over E are fine. Is this a well-studied problem? The links that I can find are for related but distinct problems, like finding the minimal distance between a pair of points in one set.
Possibly relevant: Voronoi diagrams, though I can't see how preprocessing L into a Voronoi diagram would save me computational time.
Upvotes: 3
Views: 4763
Reputation: 1929
My library GeographicLib contains a class NearestNeighbor
which implements the vantage-point trees. This works for any data
where there's a true distance metric; the geodesic distance obviously
satisfies this criterion.
Basically you initialize the class with your L
locations
and then for each of E
events you
ask for the nearest location. The breakdown on the computational
costs is as follows
set-up cost: L log L
cost per query: log L
cost of all queries: E log L
total cost: (L + E) log L
(These are base-2 logs.) Here's code that will do the lookup. Running this on 30k locations and 1M events takes about 40 secs and involved a total of 16M geodesic distance calculations. (The brute force way would take about 21 hours.)
// Read lat/lon locations from locations.txt and lat/lon events from
// events.txt. For each event print to closest.txt: the index for the
// closest location and the distance to it.
// Needs GeographicLib 1.47 or later
// compile and link with
// g++ -o closest closest.cpp -lGeographic \
// -L/usr/local/lib -Wl,-rpath=/usr/local/lib
#include <iostream>
#include <vector>
#include <fstream>
#include <GeographicLib/NearestNeighbor.hpp>
#include <GeographicLib/Geodesic.hpp>
using namespace std;
using namespace GeographicLib;
// A structure to hold a geographic coordinate.
struct pos {
double lat, lon;
pos(double lat = 0, double lon = 0)
: lat(lat), lon(lon) {}
};
// A class to compute the distance between 2 positions.
class DistanceCalculator {
private:
Geodesic _geod;
public:
explicit DistanceCalculator(const Geodesic& geod)
: _geod(geod) {}
double operator() (const pos& a, const pos& b) const {
double s12;
_geod.Inverse(a.lat, a.lon, b.lat, b.lon, s12);
return s12;
}
};
int main() {
// Define a distance function object
DistanceCalculator distance(Geodesic::WGS84());
// Read in locations
vector<pos> locs;
{
double lat, lon;
ifstream is("locations.txt");
while (is >> lat >> lon)
locs.push_back(pos(lat, lon));
}
// Construct NearestNeighbor object
NearestNeighbor<double, pos, DistanceCalculator>
locationset(locs, distance);
ifstream is("events.txt");
ofstream os("closest.txt");
double lat, lon, d;
vector<int> k;
while (is >> lat >> lon) {
pos event(lat, lon);
d = locationset.Search(locs, distance, event, k);
os << k[0] << " " << d << "\n";
}
}
Upvotes: 1
Reputation: 20228
From your description:
Solution: Round coordinates in L to the same precision of E and match equal pairs.
Explanation: Rounding effectively maps each point to its nearest neighbor on a square grid. As long as the grid resolution (~1 meter when rounding to five decimals) is lower than half the minimal distance between two points in L (~10/2 meters), you are good to go.
For maximum performance and exploiting |L| << |E|, you might want to build a hash map over the rounded coordinates in L and match each point in E in near constant time. Total runtime would then be limited by O(|E|).
Upvotes: 1
Reputation: 3477
This is amenable to a fairly direct application of space-partitioning. Most GIS libraries should provide tools to do this.
My own experience is restricted to using R-Trees. I create an index of the L items. You can use the points directly or a bounding box representing the uncertainty around the point. Then the R-tree supports an efficient (log(L) time) query of the n nearest neighbouring points/bounding boxes.
The implementation I have used successfully is libspatialindex wrapped into a python library called Rtree.
However, note that this specific tool is accurate only when using Euclidean x,y coordinates. It has significant potential for error using lat longs away from the equator, and particularly if you want to cover a geographically large area. In my case I was restricted to a single country within which I used eastings/northings. I'm not aware of which libraries provide support for handling the problem using great circle distances but they certainly ought to be available.
Upvotes: 2
Reputation: 2018
Yes you are correct. First you can construct the Voronoi Diagram of your point set of locations L in O(|L| log |L|) time using Furtune's Sweep Line approach. There are various implementations out there that can be used, Triangle would be among the most common ones.
Now you have a partitioning of the plane of O(|L|) size. To allow O(log |L|) nearest neighbour queries you need a search structure on top of the Voronoi diagram. A common approach is to use the Dobkin-Kirkpatrick Hierarchy, details can be found in various lecture notes. This method supports O(log |L|) queries and also requires only O(|L|) size. (Also mentioned in this post.)
Then the |E| queries can be accomplished in O(|E| log |L|) time.
A different approach would be to use k-d trees. From an implementation point of view they might be less work and provide the same complexities (as far as iI know). A quick search revealed these two implementations that are maybe worth testing: C++, Java.
Upvotes: 2