OpenCV Clustering Bag Of Words K-Means

Question

Using SIFT Detector and Extractor, with FlannBased Matcher, and the Dictionary set up for the BOWKMeansTrainer like this:

TermCriteria termCrit(CV_TERMCRIT_ITER, 100, 0.001);
int dictionarySize = 15; // -- Same as number of images given in
int retries = 1;
int flags = KMEANS_PP_CENTERS;

BOWKMeansTrainer trainBowTrainer(dictionarySize, termCrit, retries, flags);

the array size of the Clustered Extracted Keypoints will come out as [128 x 15].

Then when using the BOWImgDescriptorExtractor as the Extractor on a different set of 15 Images, with the previously extracted array as its Vocabulary, the array comes out at [15 x 15].

Why?

I can't find that much on how this all actually works, rather just where to put it and what values to give.

Has QUIT--Anony-Mousse · Accepted Answer

The result should always be [n x 15] if you have n images and k=15.

But in the first run, you looked at the vocabular, not the feature representation of the first images. The 128 you are seeing there is the SIFT dimensionality; these are 15 "typical" SIFT vectors; they are not descriptions of your images.

You need to read up on the BoW model, and why the outcome should ways be a vector of length k (potentially sparse, i.e. with many 0s) for each image. I have the impression you expect this approach to produce one 128-dimensional feature vector for each image. Also, k=15 is probably too small; and the training data set is too small as well.

OpenCV Clustering Bag Of Words K-Means

Answers (1)

Related Questions