John Stud
John Stud

Reputation: 1779

Interpreting K-Means cluster_centers_ output

I am having difficulty interpreting the results of the cluster_centers_ array output.

Consider the following MWE:

from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
import numpy as np

# Load the data
iris = load_iris()
X, y = iris.data, iris.target

# shuffle the data
shuffle = np.random.permutation(np.arange(X.shape[0]))
X = X[shuffle]

# scale X
X = (X - X.mean()) / X.std()

# plot K-means centroids
km = KMeans(n_clusters = 2, n_init = 10)  # establish the model

# fit the data
km.fit(X);

# km centers
km.cluster_centers_
array([[ 1.43706001, -0.29278015,  0.75703227, -0.89603057],
       [ 0.78079175, -0.04797174, -0.96467783, -1.60799713]])

In the array above, it is unclear to me how I use these values to identify the cluster center. I told K-Means to give me 2 clusters, yet it returns 8 values for me, but they cannot be x, y coordinates for all 4 features.

If I plot 1.43706001, -0.29278015; this makes intuitive sense, its a cluster right in the middle of a predicted cluster.

Example Cluster Location

So if this is the case, and my second cluster is 0.78079175, -0.04797174, what are the values in columns 2 and 3 for?

Upvotes: 4

Views: 7840

Answers (1)

Kate Melnykova
Kate Melnykova

Reputation: 1873

From documentation cluster_centers_: ndarray of shape (n_clusters, n_features)

The iris database has 4 features (X.shape = (150,4)), you want Kmeans to get two centroids in 4-dimensional feature space. cluster_centers_ does exactly that, each entry of list corresponds to the coordinates of the centroid in R^4.

Upvotes: 8

Related Questions