Rexxowski
Rexxowski

Reputation: 180

What is the best nearest neighbor algorithm for my case?

I have a predefined list of gps positions which basically makes a predefined car track. There are around 15000 points in the list. The whole list is known in prior, no points are needed to insert afterwards. Then I get around 1 milion extra sampled gps positions for which I need to find the nearest neighbor in the predefined list. I need to process all 1 milion items in single iteration and I need to do it as quickly as possible. What would be the best nearest neighbor algorithm for this case? I can preprocess the predefined list as much as I need, but the processing 1 milion items then should be as quick as possible.
I have tested a KDTree c# implementation but the performance seemed to be poor, maybe there exists a more appropriate algorithm for my 2D data. (the gps altitude is ignored in my case) Thank you for any suggestions!

My gps track

enter image description here

Upvotes: 1

Views: 1049

Answers (2)

ldog
ldog

Reputation: 12141

CGAL has a 2d point library for nearest neighbour and range searches based on a Delaunay triangulation data structure.

Here is a benchmark of their library for your use case:

// file: cgal_benchmark_2dnn.cpp
#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/Point_set_2.h>
#include <chrono>
#include <list>
#include <random>

typedef CGAL::Exact_predicates_inexact_constructions_kernel K;
typedef CGAL::Point_set_2<K>::Vertex_handle Vertex_handle;
typedef K::Point_2 Point_2;

/**
 * @brief Time a lambda function.
 *
 * @param lambda - the function to execute and time
 *
 * @return the number of microseconds elapsed while executing lambda
 */
template <typename Lambda>
std::chrono::microseconds time_lambda(Lambda lambda) {
  auto start_time = std::chrono::high_resolution_clock::now();
  lambda();
  auto end_time = std::chrono::high_resolution_clock::now();
  return std::chrono::duration_cast<std::chrono::microseconds>(end_time -
                                                               start_time);
}

int main() {
  const int num_index_points = 15000;
  const int num_trials = 1000000;

  std::random_device
      rd; // Will be used to obtain a seed for the random number engine
  std::mt19937 gen(rd()); // Standard mersenne_twister_engine seeded with rd()
  std::uniform_real_distribution<> dis(-1, 1.);
  std::list<Point_2> index_point_list;

  {
    auto elapsed_microseconds = time_lambda([&] {
      for (int i = 0; i < num_index_points; ++i) {
        index_point_list.emplace_back(dis(gen), dis(gen));
      }
    });
    std::cout << " Generating " << num_index_points << " random points took "
              << elapsed_microseconds.count() << " microseconds.\n";
  }

  CGAL::Point_set_2<K> point_set;
  {
    auto elapsed_microseconds = time_lambda([&] {
      point_set.insert(index_point_list.begin(), index_point_list.end());
    });
    std::cout << " Building point set took " << elapsed_microseconds.count()
              << " microseconds.\n";
  }

  {
    auto elapsed_microseconds = time_lambda([&] {
      for (int j = 0; j < num_trials; ++j) {
        Point_2 query_point(dis(gen), dis(gen));
        Vertex_handle v = point_set.nearest_neighbor(query_point);
      }
    });
    auto rate = elapsed_microseconds.count() / static_cast<double>(num_trials);
    std::cout << " Querying " << num_trials << " random points took "
              << elapsed_microseconds.count()
              << " microseconds.\n >> Microseconds / query :" << rate << "\n";
  }
}

On my system (Ubuntu 18.04) this can be compiled with

g++ cgal_benchmark_2dnn.cpp -lCGAL -lgmp -O3

and when run yields the performance:

 Generating 15000 random points took 1131 microseconds.
 Building point set took 11469 microseconds.
 Querying 1000000 random points took 2971201 microseconds.
 >> Microseconds / query :2.9712

Which is pretty fast. Note, with N processors you could speed this up roughly N times.

Fastest possible implementation

If two or more of the following are true:

  1. You have a small bounding box for the 150000 index points
  2. You only care of a precision up to a few decimal points (note that for lat & long coordinates going much more than 6 decimal points yields centimeter/millimeter scale precision)
  3. You have copious amounts of memory on your system

Then cache everything! You can pre-compute a grid of desired precision over your bounding box of index points. Map each grid cell to a unique address that can be indexed knowing the 2D coordinate of a query point.

Then simply use any nearest neighbour algorithm (such as the one I supplied) to map each grid cell to the nearest index point. Note this step only has to be done once to initialize the grid cells within the grid.

To run a query, this would require one 2D coordinate to grid cell coordinate calculation followed by one memory access, meaning you can't really hope for a faster approach (probably would be 2-3 CPU cycles per query.)

I suspect (with some insight) this is how a giant corporation like Google or Facebook would approach the problem (since #3 is not a problem for them even for the entire world.) Even smaller non-profit organizations use schemes like this (like NASA.) Albeit, the scheme NASA uses is far more sophisticated with multiple scales of resolution/precision.

Clarification

From the comment below, its clear the last section was not well understood, so I will include some more details.

Suppose your set of points is given by two vectors x and y which contain the x & y coordinates of your data (or lat & long or whatever you are using.)

Then the bounding box of your data is defined with dimension width = max(x)-min(x) & height=max(y)-min(y). Now create a fine mesh grid to represent the entire bounding box using NxM points using the mapping of a set of test points (x_t,y_t)

u(x_t) = round((x_t - min(x)) / double(width) * N)
v(y_t) = round((y_t - min(y)) / double(height) * M)

Then simply use indices = grid[u(x_t),v(y_t)], where indices are the indices of the closest index points to [x_t,y_t] and grid is a precomputed lookup table that maps each item in the grid to the closest index point [x,y].

For example, suppose that your index points are [0,0] and [2,2] (in that order.) You can create the grid as

grid[0,0] = 0
grid[0,1] = 0
grid[0,2] = 0 // this is a tie
grid[1,0] = 0
grid[1,1] = 0 // this is a tie
grid[1,2] = 1 
grid[2,0] = 1 // this is a tie
grid[2,1] = 1
grid[2,2] = 1

where the right hand side above is either index 0 (which maps to the point [0,0]) or 1 (which maps to the point [2,2]). Note: due to the discrete nature of this approach you will have ties where distance from one point is exactly equal to the distance to another index point, you will have to come up with some means to determine how to break these ties. Note, the number of entries in the grid determine the degree of precision you are trying to reach. Obviously in the example I gave above the precision is terrible.

Upvotes: 2

tucuxi
tucuxi

Reputation: 17945

K-D trees are indeed well suited to the problem. You should first try again with known-good implementations, and if performance is not good enough, you can easily parallelize queries -- since each query is completely independent of others, you can achieve a speedup of N by working on N queries in parallel, if you have enough hardware.

I recommend OpenCV's implementation, as mentioned in this answer

Performance-wise, the ordering of the points that you insert can have a bearing on query times, since implementations may choose whether or not to rebalance unbalanced trees (and, for example, OpenCV's does not do so). A simple safeguard is to insert points in a random order: shuffle the list first, and then insert all points in the shuffled order. While not optimal, this ensures that, with overwhelming probability, the resulting order will not be pathological.

Upvotes: 2

Related Questions