Reputation: 3510
I am classifying medical images using bag-of-words model. I did the following to extract the feature vector:
After the feature extraction I tried PCA, feature selection, changing no of clusters for KMeans etc to improve the accuracy. But in my case BOW learned on pixel values (1) outperforms (90%) than the BOW learned on features(2) (70%). My features are good and when I use those features to classify the images using some other framework I was able to get more than 95% accuracy.
My question is why BOW learned on pixels performs better than BOW learned on features?
Normal-abnormal colonoscopy image classification
Figure 1: a normal colon image
Figure 2: an image with polyp
Upvotes: 2
Views: 1918
Reputation: 4523
My understanding of your two methods for extracting features from an image patch are
Feature selection = "run PCA, k-means, or select some subset of pixels, and construct a vector of these extracted values"
Pixel Values = "create a vector from RGB values of the image"
In fact, to get good results from BOW features, people often derive individual features using relatively complicated algorithms.
In the project at http://vision.stanford.edu/projects/totalscene/index.html (paper in reference #1), the authors take BOW features from both images blocks and a segmentation. For the image blocks, they extract SIFT features, and for each segment they take shape, color, location, and texture features (see section 2.1 and follow the reference for a better description of the features they use).
In "Decomposing a Scene into Geometric and Semantically Consistent Regions." (Gould et. al.) Shape, color, edge, etc. features are derived by doing things like training boosted logistic regression classifiers, Potts models, and Gaussian Mixture models.
You probably don't need such intensive techniques to extract features that beat pixel vectors, but you should definitely browse around the literature to see what is effective.
SIFT features, color histograms, and filters to extract texture responses seem to work pretty well and also have a reasonable amount of software library support.
Upvotes: 3