Reputation: 69
I have found a clustering pattern below in a hierarchical clustering using Ward's minimum variance in R. I empirically decided five numbers of clusters based on if characteristics of individuals makes sense. Even if I use a height (indicated by 'Cut' line in the diagram, I still get same 4 clusters, however the 5th cluster (the blue one) cut down in two more clusters.
Question: My question is, is it mandatory to cut the 5th cluster on a specific height, even if it doesn't make sense as per research based knowledge? Or can I decide empirically to keep 5 clusters? Does it introduce any bias in the analysis?
Upvotes: 0
Views: 1070
Reputation: 10385
Clustering is subjective to a certain degree (even more so than supervised learning), since no one knows the true answer of how many clusters there are, or if they are really different enough to be put into different classes. If you think that the 5th class does not make sense based on your domain knowledge, then you can choose not to to split it into its class. Just make sure that you write this down clearly, so that people will know what you did and why.
Upvotes: 1