Reputation: 1524
I have around 10 K points in 5 dimensional space. We can assume that the points are randomly distributed in space (0,0,0,0,0) and (100,100,100,100,100). Clearly, the whole data set can easily reside in memory.
I would like to know which algorithm for k nearest neighbour would run faster, kd-tree or RTree.
Although I have some very high level idea of these two algorithms, I am not sure which will run faster, and why. I am open to exploring other algorithms if any, which could run fast. Please, if possible, specify why an algorithm may run faster.
Upvotes: 3
Views: 4835
Reputation: 77454
This depends on various parameters. Most importantly on your capability to implement these algorithms.
I've personally found bulk-loaded R*-trees to be faster for large data, probably because they have a better fan-out. Bulk-loaded R-trees is a more fair comparison, as kd-trees are commonly bulk-loaded (in fact, they don't support incremental operation very well at all).
For tiny data, kd-trees will likely be faster, plus they are much simpler to implement.
For other things, please refer to this earlier question / answer:
https://stackoverflow.com/a/11109467/1060350
Upvotes: 5