Reputation: 1017
I have a dataset that I am analysing to find the optimal number of clusters using k-means.
I am testing the number of clusters from [1..11] - which produces the following plot:
The original dataset has six classes but the elbow plot shows the bend really occurring at 3 clusters. For curiosity I overlaid a line on the plot from 11 clusters and back and it is almost a straight line to 6 clusters - which indicates to me that the real elbow is at 6, but it is subtle to see.
So, visually 3 looks to be the right answer, but given the known number of classes (6) the straight line I drew indicates 6...
Question:
Upvotes: 1
Views: 2815
Reputation: 1623
Based on the plot I'd say that there are 6 clusters. From my experience and intuition, I believe it makes sense to say that the "elbow" is where the "within cluster sum of squares" begins to decrease linearly.
However, for cluster validation, I recommend using silhouette coefficients as the "right answer" is objectively obtained. In addition, the silhouette coefficients takes the separation of clusters into account as well.
Upvotes: 1