user13467695
user13467695

Reputation:

Why when I do 4 clusters clustering with K-means, I have only one intertia and not 4?

I have a dataframe and I did 4 clusters clustering using sklearn KMeans function:

km = KMeans(n_clusters=4, init='random', n_init=10, max_iter=10,
                    tol=1e-4, random_state=10, algorithm='full', )  
km.fit(df)

So , i have 4 clusters, but when i do this:

km.inertia_

I get only one value:

1732.350

However according to definition of inertia, it is a sum of squared distances of samples to their closest cluster center. So there must be 4 inertia values not 1 or am i wrong?

Upvotes: 1

Views: 739

Answers (2)

fdermishin
fdermishin

Reputation: 3686

Inertia is used as a criteria to select the best clustarization among several runs. To be able to find the best one, all clusterizations should be ordered in some way. This is done by assigning a single scalar value called inertia to each of them so they can be easily compared to each other. This value is not meant to be used in any other way.

Here is current implementation of calculation of its value in the case the matrix is dense (source code is available here):

cpdef floating _inertia_dense(
        np.ndarray[floating, ndim=2, mode='c'] X,  # IN
        floating[::1] sample_weight,               # IN
        floating[:, ::1] centers,                  # IN
        int[::1] labels):                          # IN
    """Compute inertia for dense input data
    Sum of squared distance between each sample and its assigned center.
    """
    cdef:
        int n_samples = X.shape[0]
        int n_features = X.shape[1]
        int i, j

        floating sq_dist = 0.0
        floating inertia = 0.0

    for i in range(n_samples):
        j = labels[i]
        sq_dist = _euclidean_dense_dense(&X[i, 0], &centers[j, 0],
                                         n_features, True)
        inertia += sq_dist * sample_weight[i]

    return inertia

There is a single loop, which runs through all clusters and accumulates the sum, so it doesn't provide a way to get inertia values for each cluster separately. If you need inertia for each cluster, then you have to implement it yourself.

Upvotes: 2

roddar92
roddar92

Reputation: 366

Attribute interia is a number, it's a sum of squared distances of samples to their nearest cluster center.

Upvotes: 0

Related Questions