Reputation: 2749
I have about 130,000 SIFT descriptors. I am building a hierarchical Kmeans-index using Opencv's flann module. After this I want to quantize these 130,000 descriptors (will quantize more later). I am using flann's knnsearch method for doing this. But the result of this method is something weird. For every descriptor the nearest index it is showing is the index of the descriptor itself. However, it should be displaying the cluster-ID of the nearest cluster which will be one of the leaves of the HIK-tree.
Should I try k=2
Here is a code snippet -
int k=1;
cv::flann::KMeansIndexParams indexParams(8,4,cvflann::FLANN_CENTERS_KMEANSPP) ;
cv::flann::Index hik_tree(cluster_data, indexParams);
Mat indices,dist;
hik_tree.knnSearch(cluster_data, indices, dist, k, cv::flann::SearchParams(64));
Upvotes: 0
Views: 1778
Reputation: 169
k-NN is a supervised classification algorithm, that's why you are supposed to construct an Index
object with your training samples, so use
cv::flann::Index hik_tree(samples, indexParams);
instead of
cv::flann::Index hik_tree(cluster_data, indexParams);
Upvotes: 0
Reputation: 8053
knnSearch
is looking for the k-nearest neighbours in the index (it does not give the cluster-ID!). You build your index using cluster_data
, and then you try to match cluster_data
against itself. In this situation, it is not surprising that the closest neighbour to each descriptor is itself...
EDIT: If you want to get the centers, have a look at this (from the source of the FLANN library):
/**
* Chooses the initial centers using the algorithm proposed in the KMeans++ paper:
* Arthur, David; Vassilvitskii, Sergei - k-means++: The Advantages of Careful Seeding
*/
template <typename Distance>
class KMeansppCenterChooser : public CenterChooser<Distance>
{
...
Upvotes: 2