Reputation: 296
I'm going to implement the same function as rangesearch in matlab. The problem is the data is really big(6.7million 3D points). I read Pairwise Distance Calculation in c++ and improve my code but it is not fast enough yet. Since computing distance matrix is Ram consuming for this amount of data it is not applicable. I put my code here. Let me know if there is a way to make it faster.I'm not sure does parallelization helps here or not. The data is sorted on 1st dimension and I want to make sure that for every point the first neighbour is the point itself.
std::vector<std::vector<long int>> rangesearch(std::vector<std::vector<float>> &data, float radius) {
float distmat = 0;
float xdist = 0;
std::vector<std::vector<long int>> indices(data.size());
//This make sure that the first neighbour of each point is itself.
for (unsigned long int i = 0; i < data.size(); i++) {
indices[i].push_back(i);
}
// instead of computing sqrt() of distance, compute the 2nd power of radius once and compare it again and again which is faster
radius = std::pow(radius, 2);
for (unsigned long int i = 0; i < data.size(); i++) {
for (long int j = i + 1; j < data.size(); j++) {
xdist = std::pow(data[i][0] - data[j][0], 2);
distmat = xdist + std::pow(data[i][1] - data[j][1], 2) + std::pow(data[i][2] - data[j][2], 2);
if (distmat <= radius) {
indices[i].push_back(j);
indices[j].push_back(i);
}
//This is just to make the preprocessing faster. Data should be sorted based on X cordinates.
//Then if the distance for x cordinate is bigger than radius it means that it will be even bigger
// for the rest of the point so there is no need to check all of them and skip the rest!
if (xdist > radius)
break;
}
}
return indices;
}
Upvotes: 0
Views: 246
Reputation: 50338
The problem that you are trying to solve looks like a nearest-neighbor search or a n-body simulation.
The worst-case complexity of the current code is O(n^2)
. With n=6.7e6
, this means about ten thousand billion iterations. Sure, the breaking condition, parallelism and low-level optimizations will help a bit, but the resulting code will still be very slow. Thus, you need to find a better algorithm.
The common way to solve this kind of problem consists in putting all elements in a BSP-Tree data structure (such as Quadtree or Octree. Such a data structure helps you to locate the nearest elements near a location in a O(log(n))
time. As a result, the overall complexity of this method is O(n log(n))
!
Important note: I assumed the radius is small. Indeed, if radius
is too big, then you need to iterate over the whole tree resulting in a quadratic complexity. Actually, in this case the size of the written output is quadratic. Thus, O(n^2)
would be unfortunately the optimal complexity.
Upvotes: 2