Reputation: 822
I'm using K-Means algorithm (in sklearn
) to cluster 1-D array of values, and I want to decide the optimal number of clusters (K) in my script.
I'm familiar with the Elbow Method, but all implementations require drawing the the clustering WCSS value, and spotting visually the "Elbow" in the plot.
Is there a way to find the elbow by code (not visually), or other way to find optimal K by code?
Upvotes: 0
Views: 1575
Reputation: 7751
A relatively simple method is to connect the points corresponding to the minimum k value and the maximum k value on the elbow fold line, and then find the point with the maximum vertical distance between the fold line and the straight line:
import numpy as np
from sklearn.cluster import KMeans
def select_k(X: np.ndarray, k_range: np.ndarray) -> int:
wss = np.empty(k_range.size)
for i, k in enumerate(k_range):
kmeans = KMeans(k)
kmeans.fit(X)
wss[i] = ((X - kmeans.cluster_centers_[kmeans.labels_]) ** 2).sum()
slope = (wss[0] - wss[-1]) / (k_range[0] - k_range[-1])
intercept = wss[0] - slope * k_range[0]
y = k_range * slope + intercept
return k_range[(y - wss).argmax()]
Upvotes: 1