Reputation: 591
So following the example in documentation (here):
The KElbowVisualizer implements the “elbow” method to help data scientists select the optimal number of clusters by fitting the model with a range of values for K. If the line chart resembles an arm, then the “elbow” (the point of inflection on the curve) is a good indication that the underlying model fits best at that point.
What if there is no elbow in the chart? When I run the same code on my data set the output is:
So the distortion score keeps increasing for every additional cluster.
However, when I run another example of the elbow method, using the kmeans.intertia attribute:
sse = {}
for k in range(1, 10):
kmeans = KMeans(n_clusters=k, max_iter=1000).fit(testDF)
testDF["clusters"] = kmeans.labels_
#print(data["clusters"])
sse[k] = kmeans.inertia_ # Inertia: Sum of distances of samples to their
closest cluster center
plt.figure()
plt.plot(list(sse.keys()), list(sse.values()))
plt.xlabel("Number of cluster")
plt.ylabel("SSE")
plt.show()
The output is:
Which does have an elbow.
What is the difference between these two methods? Why is there no elbow on the first graph?
According to documentation they both apply the same distance method, i.e. "Sum of squared distances of samples to their closest cluster center."
Upvotes: 3
Views: 3857
Reputation: 560
I had the same problem too and the elbow-point showed up after updating the k-modes (am clustering binary data):
pip install -U kmodes
Upvotes: 0
Reputation: 959
I had the same problem just now and updating to Yellowbrick v1.1 fixed it.
pip install -U yellowbrick
or in a Jupyter cell:
!pip install -U yellowbrick
Upvotes: 1