Reputation: 13
I'm looking for a datastructure.
Lets say you have n Points p := (x,y) with x,y ∈ [-128, 128]
You now initalize the Datastructure and add all n Points to it.
Now for any point you want to easily find any points close to it.
More precisely:
Specify a radius r<1 and point p.
You want a function F that outputs a (unsorted)list of all points q with d(p,q) < r
Now i'm looking for a datastructure that allows for optimizing this function
(The standard algorithm is in O(n), can you get better than that?)
I'd be greatful for an answer :)
For people that know their stuff and want to help even further:
Say the points move during intervalls(with a maximum distance of a < 2).
During every intervall F is called for every point (n-times) , we now want to expand the optimization that after every intervall the Function F is as efficient.
So we want a function G that resorts the datastructure.
G is called once and F called n times. We want O(G) + n*O(F) < O(n^2)
In terms of worst case there really is no room for improvement so we make the assumption that in every intervall for every point p, at least 50% of all points are outside of the radius specified for the function F
The values above are kind of arbitrary and should be exchangeable with any other number. I chose these numbers so that the problem is easier to understand, in additon x and y are floating point numbers.
I'd like an answer that points me to another article, wikipedia entry or any other source that had the same or similar problem. I really expect noone to spend the whole day trying to explain a datastructure to me ;)
Anyway all help is appreciated. Thank you very much.
Upvotes: 1
Views: 103
Reputation: 777
This problem reminds me of a particle simulation(which had similar problems like you describe) I wrote some time ago. I found a datastructure which allows(with a few minor deviations in practice and assuming you choose a good number of chunks) for O(n) complexity.
You can divide the 2 dimensional space you have into small rectangular(I think squares are the best in your case) chunks(with side length bigger than r
).
Then you need O(n)
time to sort the points into those chunks.
Let k
be the total number of chunks then you have.
Then finding the all points that are within a a radius r
for every point will take O(n*(n/k)) = O(n²/k)
where n/k is the approximate number of points inside each chunk(assuming a regular distribution which was true for the particle simulation, not sure about your problem though). Keep in mind for every point you also need to look at the 8 neighboring chunks!
Then you also have an additional O(k)
which comes from the fact that you need to iterate through the chunks to access the elements.
So in total this data structure has a complexity of O(n²/k + n + k)
.
Now to find a relation between n
and the optimal k
you have to find the minima of the the function f(k) = a*n²/k + b*n + c*k
which can be done by finding the derivative and setting it equal to zero:
f'(k) = -an²/k² + c = 0
→ n²/k² = c/a = constant
→ n is proportional to k and therefor if k can be chosen optimal:
O(n²/k + n + k) = O(n²/n + n+ n) = O(n)
Worst case is of course still O(n²)
when k = 1
Upvotes: 1
Reputation: 372784
There are many good data structures you can use to solve the problem efficiently in two dimensions. The k-d tree data structure allows you to search for all points in a rectangle fairly quickly compared to a standard linear search provided that the points are more or less randomly distributed. The quadtree data structure similarly supports this sort of search. R-trees would be another option, though they’re primarily optimized for when you have huge numbers of points and want to store information on disk efficiently.
My recollection is that in the worst case all of these approaches take time O(n), but only with pathologically chosen inputs. For inputs that have “reasonable” distributions the runtimes of these algorithms are typically much better, hence their widespread use.
Hope this helps!
Upvotes: 0