abolfazl taghribi
abolfazl taghribi

Reputation: 296

Fast rangesearch implementation in C++

I'm going to implement the same function as rangesearch in matlab. The problem is the data is really big(6.7million 3D points). I read Pairwise Distance Calculation in c++ and improve my code but it is not fast enough yet. Since computing distance matrix is Ram consuming for this amount of data it is not applicable. I put my code here. Let me know if there is a way to make it faster.I'm not sure does parallelization helps here or not. The data is sorted on 1st dimension and I want to make sure that for every point the first neighbour is the point itself.

std::vector<std::vector<long int>> rangesearch(std::vector<std::vector<float>> &data, float radius) {
    float distmat = 0;
    float xdist = 0;
    std::vector<std::vector<long int>> indices(data.size());

    //This make sure that the first neighbour of each point is itself.
    for (unsigned long int i = 0; i < data.size(); i++) {
        indices[i].push_back(i);
    }

    // instead of computing sqrt() of distance, compute the 2nd power of radius once and compare it again and again which is faster
    radius = std::pow(radius, 2);

    for (unsigned long int i = 0; i < data.size(); i++) {
        for (long int j = i + 1; j < data.size(); j++) {

            xdist = std::pow(data[i][0] - data[j][0], 2);
            distmat = xdist + std::pow(data[i][1] - data[j][1], 2) + std::pow(data[i][2] - data[j][2], 2);

            if (distmat <= radius) {
                indices[i].push_back(j);
                indices[j].push_back(i);
            }

            //This is just to make the preprocessing faster. Data should be sorted based on X cordinates.
            //Then if the distance for x cordinate is bigger than radius it means that it will be even bigger
            // for the rest of the point so there is no need to check all of them and skip the rest!
            if (xdist > radius)
                break;
        }
    }
    return indices;
}

Upvotes: 0

Views: 246

Answers (1)

J&#233;r&#244;me Richard
J&#233;r&#244;me Richard

Reputation: 50338

The problem that you are trying to solve looks like a nearest-neighbor search or a n-body simulation.

The worst-case complexity of the current code is O(n^2). With n=6.7e6, this means about ten thousand billion iterations. Sure, the breaking condition, parallelism and low-level optimizations will help a bit, but the resulting code will still be very slow. Thus, you need to find a better algorithm.

The common way to solve this kind of problem consists in putting all elements in a BSP-Tree data structure (such as Quadtree or Octree. Such a data structure helps you to locate the nearest elements near a location in a O(log(n)) time. As a result, the overall complexity of this method is O(n log(n))!

Important note: I assumed the radius is small. Indeed, if radius is too big, then you need to iterate over the whole tree resulting in a quadratic complexity. Actually, in this case the size of the written output is quadratic. Thus, O(n^2) would be unfortunately the optimal complexity.

Upvotes: 2

Related Questions