Jonas
Jonas

Reputation: 395

How to automatically cluster my dataset images into different groups based on local features or global using python or OpenCV?

I have a dataset of images , and i want to group my images into different groups based on content. What i have tried till now is find median of images and thought to group them into different clusters based on median values , How can i do that? This is what i have tried till now. How can i do cluster my images into groups? I did Google out many things on clustering, But it showed results on clustering based on colors rather than clustering images into groups.Can anyone provide me with informative answers?Can i automatically cluster my dataset into groups based on median or some other technique?

from PIL import Image
import numpy as np
import os
Median=[]
k=[]
def get_imlist(path):       
    return [os.path.join(path,f) for f in os.listdir(path) if f.endswith('.jpg')]

path='D:/Images/dataset'
imlist= get_imlist(path)
for file in imlist:
    head,tail=os.path.split(file)
    im=np.array(Image.open(file).convert('L'))
    m=np.median(im)
    M=[m,tail]
    print '.'
    Median.append(M)
Results=sorted(Median, key=lambda median: median[0])
print Results

Upvotes: 3

Views: 2268

Answers (1)

QED
QED

Reputation: 808

k-means is a common method for clustering and is in OpenCV http://docs.opencv.org/modules/core/doc/clustering.html.

Before you cluster it is recommended that you use a representation that has a lower number of dimensions than the full n*m set of pixels. This is for two main reasons, robustness to noise, and the reduction of computational cost of the clustering process. The choice of representation may be critical to the perceived quality of the clusters, and will largely depend on your application. My current favorite is the GIST descriptor (c++: http://lear.inrialpes.fr/software, matlab: http://people.csail.mit.edu/torralba/code/spatialenvelope/). However that is not in OpenCV. So here i will use a gray level histogram, thus reducing the dimensions from m*n to b = no. of bins.

Assuming a vector of gray level input images named frames.

    //set up histogram 
int histSize = 128;
float range[] = { 0, histSize } ;
const float* histRange = { range };
bool uniform = true; bool accumulate = false;
Mat_<float> dataHists;


cv::Mat grayImg;
Mat hist_i;
for(int i=0; i <frames.size(); i++)
{
    grayImg =frames[i];

    //histogram gray image
    calcHist( &grayImg, 1, 0, Mat(), hist_i, 1, &histSize, &histRange, uniform, accumulate );
    normalize(hist_i, hist_i, 0, hist_i.rows, NORM_MINMAX, -1, Mat() );

    //transpose for feature vector
    hist_i = hist_i.t();

    //add to feature vectors for k-means
    dataHists.push_back(cv::Mat(hist_i));

}

//k-means
int k = 100;
cv::Mat bestLabels;
cv::kmeans(dataHists,k,bestLabels,TermCriteria(),3,KMEANS_PP_CENTERS);

//have a look
vector<cv::Mat> clusterViz(bestLabels.rows);
for(int i=0;i<bestLabels.rows; i++)
{
    clusterViz[bestLabels.at<int>(i)].push_back(cv::Mat(frames[bestLabels.at<int>(i)]));
}

namedWindow("clusters", WINDOW_NORMAL );
for(int i=0;i<clusterViz.size(); i++)
{
    cv::imshow("clusters",clusterViz[i]);
    cv::waitKey();
}

Hope this helps you.

Upvotes: 1

Related Questions