Elbow Method for K-Means in python

I'm using K-Means algorithm (in sklearn) to cluster 1-D array of values, and I want to decide the optimal number of clusters (K) in my script.

I'm familiar with the Elbow Method, but all implementations require drawing the the clustering WCSS value, and spotting visually the "Elbow" in the plot.

Is there a way to find the elbow by code (not visually), or other way to find optimal K by code?

Upvotes: 0

Answers (1)

Mechanic Pig

Reputation: 7751

A relatively simple method is to connect the points corresponding to the minimum k value and the maximum k value on the elbow fold line, and then find the point with the maximum vertical distance between the fold line and the straight line:

import numpy as np

from sklearn.cluster import KMeans


def select_k(X: np.ndarray, k_range: np.ndarray) -> int:
    wss = np.empty(k_range.size)
    for i, k in enumerate(k_range):
        kmeans = KMeans(k)
        kmeans.fit(X)
        wss[i] = ((X - kmeans.cluster_centers_[kmeans.labels_]) ** 2).sum()

    slope = (wss[0] - wss[-1]) / (k_range[0] - k_range[-1])
    intercept = wss[0] - slope * k_range[0]
    y = k_range * slope + intercept

    return k_range[(y - wss).argmax()]

Upvotes: 1

Elbow Method for K-Means in python

Answers (1)

Related Questions