Reputation: 51
Are there any articles about k-NN search problem for really huge amount of dimensions like 10k - 100k?
Most of articles with tests on real-world data operates with 10-50 dims, and a few operates 100-500.
In my case there is ~10^9 points in ~100k feature dimension, and there is no way to effectively reduce number of dimensions.
UPD.: At the moment we are trying to adapt and implement VP-trees, but it's clear enough that any tree struct on this dimensionality wont work well.
Second approach is LSH, but there may be big troubles with accuracy depending on data distribution.
Upvotes: 2
Views: 196
Reputation: 2367
Take a look at FLANN library.
In this paper you will find a dissertation on how data dimensionality is one of the factors that has a great impact on the nearest neighbor matching performance, and the solutions adopted in FLANN.
Upvotes: 2
Reputation: 2021
Are you using kd-tree for nearest neighbour search? kd-tree deteriorates to almost exhaustive search in higher dimensions.
In higher dimensions, it is usually suggested to use approximate nearest neighbour search. here is the link to the original paper: http://cvs.cs.umd.edu/~mount/Papers/dist.pdf, and if that is a bit too heavy, try this: dimacs.rutgers.edu/Workshops/MiningTutorial/pindyk-slides.ppt
There are many factors affecting the choice of decision when it comes to nearest neighbour search. Whether you need to load the points entirely in primary memory or you could use secondary memory should also govern your decision.
Upvotes: 1