SEU
SEU

Reputation: 1390

clustering algorithm with minimum number of points

I am trying to separate a data set that has 2 clusters that do not overlap in anyway and a single data point that is away from these two clusters.

When I use kmeans() to get the 2 clusters, it splits one of the "valid" cluster into half and considers the single data point as a separate cluster.

Is there a way to specify minimum number of points for this? I am using MATLAB.

Upvotes: 0

Views: 832

Answers (1)

Romain Reboulleau
Romain Reboulleau

Reputation: 306

There are several solutions:

  1. Easy: try with 3 clusters;
  2. Easy: remove the single data point (that you can detect as an outlier with any outlier detection technique;
  3. To be tried: Use a k-medoids approach instead of k-means. This sometimes helps getting rid of outliers.
  4. More complicated but surely works: Perform spectral clustering. This helps you get over the main issue of k-means, which is the brutal use of the euclidian distance

More explanations on the inadequate behaviour of k-means can be found on Cross Validated site (see here for instance).

Upvotes: 1

Related Questions