Anton
Anton

Reputation: 591

Kmeans elbow method not returning an elbow

So following the example in documentation (here):

The KElbowVisualizer implements the “elbow” method to help data scientists select the optimal number of clusters by fitting the model with a range of values for K. If the line chart resembles an arm, then the “elbow” (the point of inflection on the curve) is a good indication that the underlying model fits best at that point.

What if there is no elbow in the chart? When I run the same code on my data set the output is: enter image description here

So the distortion score keeps increasing for every additional cluster.

However, when I run another example of the elbow method, using the kmeans.intertia attribute:

sse = {}
for k in range(1, 10):
    kmeans = KMeans(n_clusters=k, max_iter=1000).fit(testDF)
    testDF["clusters"] = kmeans.labels_
    #print(data["clusters"])
    sse[k] = kmeans.inertia_ # Inertia: Sum of distances of samples to their 
closest cluster center
plt.figure()
plt.plot(list(sse.keys()), list(sse.values()))
plt.xlabel("Number of cluster")
plt.ylabel("SSE")
plt.show()

The output is:

enter image description here

Which does have an elbow.

What is the difference between these two methods? Why is there no elbow on the first graph?

According to documentation they both apply the same distance method, i.e. "Sum of squared distances of samples to their closest cluster center."

Upvotes: 3

Views: 3857

Answers (2)

Mr.Slow
Mr.Slow

Reputation: 560

I had the same problem too and the elbow-point showed up after updating the k-modes (am clustering binary data):

pip install -U kmodes

Upvotes: 0

Philip Seyfi
Philip Seyfi

Reputation: 959

I had the same problem just now and updating to Yellowbrick v1.1 fixed it.

pip install -U yellowbrick

or in a Jupyter cell:

!pip install -U yellowbrick

Upvotes: 1

Related Questions