Computing and representing centroids with K-means clustering

Question

I am currently stuck on a school exercise. The exercise is as follows.

We will consider a subset of the wild faces data described in berg2005[1]. Load the wildfaces data, Data/wildfaces using the loadmat function. Each data object is a 40*40*3=4800 dimensional vector, corresponding to a 3-color 40*40 pixels image. Compute a k-means clustering of the data with K=10 clusters. Plot a few random images from the data set as well as their corresponding cluster centroids to see how they are represented.

[1] Tamara L Berg, Alexander C Berg, Jaety Edwards, and DA Forsyth. Who's in the picture. Advances in Neural Information Processing Systems, 17:137-144, 2005.

Now to my question, how do I compute the centroids for one image? I am currently able to display the face and calculate centroids for the dataset. What I don't understand is, how do I know which centroids correspond to image 4 (as used in my code sample)? Do I have to calculate centroids for the entire dataset X or just X[4]? What steps do I need to take now, to 'plot the corresponding cluster centroids to see how they are represented'?

import scipy.io as spio
import sklearn.cluster as cl
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow

faces = spio.loadmat('Data/wildfaces.mat',squeeze_me=True)
X = faces['X']

Y = cl.k_means(X,10)
centroids = Y[0]
clusters = Y[1]

imshow(np.reshape(X[4,:],(3,40,40)).T)
plt.show()

Has QUIT--Anony-Mousse · Accepted Answer

You already have centroids. One per cluster.

You don't need to compute them, only display them.

Check the contents of Y

Computing and representing centroids with K-means clustering

Answers (1)

Related Questions