user107511
user107511

Reputation: 822

Elbow Method for K-Means in python

I'm using K-Means algorithm (in sklearn) to cluster 1-D array of values, and I want to decide the optimal number of clusters (K) in my script.

I'm familiar with the Elbow Method, but all implementations require drawing the the clustering WCSS value, and spotting visually the "Elbow" in the plot.

Is there a way to find the elbow by code (not visually), or other way to find optimal K by code?

Upvotes: 0

Views: 1575

Answers (1)

Mechanic Pig
Mechanic Pig

Reputation: 7751

A relatively simple method is to connect the points corresponding to the minimum k value and the maximum k value on the elbow fold line, and then find the point with the maximum vertical distance between the fold line and the straight line:

import numpy as np

from sklearn.cluster import KMeans


def select_k(X: np.ndarray, k_range: np.ndarray) -> int:
    wss = np.empty(k_range.size)
    for i, k in enumerate(k_range):
        kmeans = KMeans(k)
        kmeans.fit(X)
        wss[i] = ((X - kmeans.cluster_centers_[kmeans.labels_]) ** 2).sum()

    slope = (wss[0] - wss[-1]) / (k_range[0] - k_range[-1])
    intercept = wss[0] - slope * k_range[0]
    y = k_range * slope + intercept

    return k_range[(y - wss).argmax()]

Upvotes: 1

Related Questions