adikshit
adikshit

Reputation: 205

OpenCV: In FLANN module KDTree constructor creates 4 trees of same size. why?

In FLANN module the KDTree constructor takes configuration params for creating trees. I see the default value is 4. Can someone please suggest why 4 or why more than one trees are required for nearest neighbor search ?

/**
 * KDTree constructor
 *
 * Params:
 *          inputData = dataset with the input features
 *          params = parameters passed to the kdtree algorithm
 */
KDTreeIndex(const Matrix<ElementType>& inputData, const IndexParams& params = KDTreeIndexParams(),
            Distance d = Distance() ) :
    dataset_(inputData), index_params_(params), distance_(d)
{
    size_ = dataset_.rows;
    veclen_ = dataset_.cols;

    trees_ = get_param(index_params_,"trees",4);   <<<<------------------ default 4
    tree_roots_ = new NodePtr[trees_];

    // Create a permutable array of indices to the input vectors.
    vind_.resize(size_);
    for (size_t i = 0; i < size_; ++i) {

Upvotes: 1

Views: 1551

Answers (1)

Vlad
Vlad

Reputation: 4525

I guess the trees aren't searched till the end; they also vary slightly in the way they constructed. So when 4 trees are partially searched, their combined performance is much better than a single tree partially searched. At the same time they work faster (and possibly more memory efficiently) than a single tree fully searched. At least this is my intuition about this issue.

FLANN - stands for a fast library for approximate nearest neighbours. The word approximate can be a key here. For example, at some point in tree construction one has to find a median of a set of points (to split space efficiently on approximately equal halves). This takes n*log(n) operations but median can be approximated from a smaller sample, say, n=min(100, n/100) - I am making this up to give an example. This will result in a speed-up of approximately x600 for construction at a small price of a little slower search. Also, instead of exploring all dimensions for greater variance to choose a split, we may look at a limited set. This explains why trees can be different.

Another aspect here, approximation to nearest neighbour becomes more important in higher dimensional spaces since standard kd-tree looses its efficiency compared to an exhaustive search. One way to approximate is to inspect only a limited number of leaves to satisfy a search precision. Often times a priority queue is maintained over all the trees to order search by increasing distance. In sum, speed and memory efficiency can be greatly improved with approximation and multiple trees with only little loss in precision.

Upvotes: 1

Related Questions