Nitesh kumar
Nitesh kumar

Reputation: 376

Why is the same group of cluster's datapoints falling far away or scattered in Kmeans clustering?

I have a doubt which is also been asked me plenty times in my meetings where I am successful in failing to answer it.. I am hoping that you can help me out to know the insight of this question.

I had used kmeans clustering in my project for clustering numerous documents for the respective problem areas. I also used matplotlib to plot the coordinates of the data point. More often the data points which falls to the same cluster are scattered or far away from other documents or datapoints which falls in the same group of cluster. The question generally people ask me is, if the document or the datapoint is from the same cluster/group then it needs to be closer to each other, Why is that not happening with respect to the documents thats of the same group/cluster.

How do I convince them, Sometimes I go Clueless what to say them.

Adding to the same question, I had no control on the formation of the cluster, but as a domain expert in my field, I very well know the problem areas the documents belongs to. So how do I configure/cluster this thousands of documents into only those problem areas accurately using Kmeans or any other clustering machinisium or by playing around with the hyperparameters. Kindly help me.enter image description here

I Have take reference from http://brandonrose.org/clustering

enter image description here

Father, New york, brother is a cluster which is in purple. If it belongs to the same cluster then it all needs to be at one side plot screen closer to each other. Why is it scattered everywhere in the plot screen. Thats what is also happening in my case.

Upvotes: 0

Views: 414

Answers (1)

Frank Puffer
Frank Puffer

Reputation: 8215

You provide very little information about your data, therfore this answer is a bit speculative. But I am quite sure that your data points have more than two components and that you do the k-means clustering in an at least three-dimensional space. Then you use some kind of projection to display them in 2D. Because of the projection, points that are originally far away from each other seem to be close together. The 2D plot says little about the neighborhood relations in the original, higher-dimensional space.

Upvotes: 1

Related Questions