Reputation: 13323
I want to store from 50 to 10 000 vectors in 3 to 20 dimensions. I want to know in which structure to store the vectors in order to be able to solve the nearest neighbor or approximate nearest neighbor problem quickly. I will use the Euclidean, Manhattan, Max and weighted Manhattan metric.
I started to read into the problem and found out (correct me if I am wrong) that when the number of dimensions is so much smaller than the number of vectors, that the kd-trees will do it. The performance can be deeply sublinear (O(log(n))).
The problem is that the structure will be changing very rapidly. Each vector can change thousands of times in the course of the program. Furthermore the vectors needn't maintain their approximate location or scale. The entire structure can "travel" through R^n.
The problem is that in order to maintain the high performance of the kd-tree, one needs to do rebalancing now and then. This operation can be as expensive as rebuilding the entire tree.
How to solve the problem of rapidly changing kd-tree?
Upvotes: 2
Views: 251
Reputation: 13556
You should do an amortized analysis of algorithms operating on different data structures. The results will vary on the order of the operations on that specific data structure you are using.
I would suggest to also have a look at the R-tree. It might also be a good idea to have a look at a static grid, because updating that structure might perform reasonably well if updates to the data structure are more frequent than queries.
If updates to the data structure are so frequent than it might be a good idea to not always update the data structure with every change, but to use an outdated one first and afterwards do a search over all changed elements. That way you could do batch changes to the data structure, which might be a bit more efficient. That's one thing the amortized analysis could also answer.
You should also have a look at the literature available for multidimensional trees. Their you will surely find proposals for more efficient operations on data structures or ones you haven't thought about yet. However I can't recommend literature yet.
Upvotes: 2