Knn search for large data?

Question

I'm interested in performing knn search on large dataset.

There are some libs: ANN and FLANN, but I'm interested in the question: how to organize the search if you have a database that does not fit entirely into memory(RAM)?

JonasVautherin · Accepted Answer

I suppose it depends on how much bigger your index is in comparison to the memory. Here are my first spontaneous ideas:

Supposing it was tens of times the size of the RAM, I would try to cluster my data using, for instance, hierarchical clustering trees (implemented in FLANN). I would modify the implementation of the trees so that they keep the branches in memory and save the leaves (the clusters) on the disk. Therefore, the appropriate cluster would have to be loaded each time. You could then try to optimize this in different ways.
If it was not that bigger (let's say twice the size of the RAM), I would separate the dataset in two parts and create one index for each. I would therefore need to find the nearest neighbor in each dataset and then choose between them.

Knn search for large data?

Answers (2)

Related Questions