Reputation: 395
I have a dataset of images , and i want to group my images into different groups based on content. What i have tried till now is find median of images and thought to group them into different clusters based on median values , How can i do that? This is what i have tried till now. How can i do cluster my images into groups? I did Google out many things on clustering, But it showed results on clustering based on colors rather than clustering images into groups.Can anyone provide me with informative answers?Can i automatically cluster my dataset into groups based on median or some other technique?
from PIL import Image
import numpy as np
import os
Median=[]
k=[]
def get_imlist(path):
return [os.path.join(path,f) for f in os.listdir(path) if f.endswith('.jpg')]
path='D:/Images/dataset'
imlist= get_imlist(path)
for file in imlist:
head,tail=os.path.split(file)
im=np.array(Image.open(file).convert('L'))
m=np.median(im)
M=[m,tail]
print '.'
Median.append(M)
Results=sorted(Median, key=lambda median: median[0])
print Results
Upvotes: 3
Views: 2268
Reputation: 808
k-means is a common method for clustering and is in OpenCV http://docs.opencv.org/modules/core/doc/clustering.html.
Before you cluster it is recommended that you use a representation that has a lower number of dimensions than the full n*m set of pixels. This is for two main reasons, robustness to noise, and the reduction of computational cost of the clustering process. The choice of representation may be critical to the perceived quality of the clusters, and will largely depend on your application. My current favorite is the GIST descriptor (c++: http://lear.inrialpes.fr/software, matlab: http://people.csail.mit.edu/torralba/code/spatialenvelope/). However that is not in OpenCV. So here i will use a gray level histogram, thus reducing the dimensions from m*n to b = no. of bins.
Assuming a vector of gray level input images named frames.
//set up histogram
int histSize = 128;
float range[] = { 0, histSize } ;
const float* histRange = { range };
bool uniform = true; bool accumulate = false;
Mat_<float> dataHists;
cv::Mat grayImg;
Mat hist_i;
for(int i=0; i <frames.size(); i++)
{
grayImg =frames[i];
//histogram gray image
calcHist( &grayImg, 1, 0, Mat(), hist_i, 1, &histSize, &histRange, uniform, accumulate );
normalize(hist_i, hist_i, 0, hist_i.rows, NORM_MINMAX, -1, Mat() );
//transpose for feature vector
hist_i = hist_i.t();
//add to feature vectors for k-means
dataHists.push_back(cv::Mat(hist_i));
}
//k-means
int k = 100;
cv::Mat bestLabels;
cv::kmeans(dataHists,k,bestLabels,TermCriteria(),3,KMEANS_PP_CENTERS);
//have a look
vector<cv::Mat> clusterViz(bestLabels.rows);
for(int i=0;i<bestLabels.rows; i++)
{
clusterViz[bestLabels.at<int>(i)].push_back(cv::Mat(frames[bestLabels.at<int>(i)]));
}
namedWindow("clusters", WINDOW_NORMAL );
for(int i=0;i<clusterViz.size(); i++)
{
cv::imshow("clusters",clusterViz[i]);
cv::waitKey();
}
Hope this helps you.
Upvotes: 1