Sandy
Sandy

Reputation: 1148

K-means clustering with pre-defined centroids

I'm trying to run K-means algorithm with predefined centroids. I have had a look at the following posts:

1.R k-means algorithm custom centers

2.Set static centers for kmeans in R

However, every time I run the command:

km = kmeans(df_std[,c(10:13)], centers = centroids)

I get the following error:

**Error: empty cluster: try a better set of initial centers**

I have defined the centroids as:

centroids = matrix(c(140.12774, 258.62615, 239.36800, 77.43235,
                      33.37736, 58.73077,  68.80000,  12.11765,
                     0.8937264, 0.8118462, 0.8380000, 0.8052941,
                     11.989858, 12.000000, 8.970000,  1.588235),
ncol = 4, byrow = T)

And my data, is a subset of a data frame say: df_std. It has been scaled already

df_std[,c(10:13)]

I'm wondering why would the system give the above error? Any help on this would be highly appreciated!

Upvotes: 1

Views: 4731

Answers (2)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

Use a nearest neighbor classifier using the centers only, do not recluster.

That means every point is labeled just as the nearest center. This is similar to k-means but you do not change the centers, you do not need to iterate, and every new data point can be processed independently and in any order. No problem arises when processing just a single point at a time (in your case, k-means failed because one cluster became empty!)

Upvotes: 4

Sandy
Sandy

Reputation: 1148

While browsing for the specific error that I posted above:

Error: empty cluster: try a better set of initial centers

I found the following link to a conversation:

http://r.789695.n4.nabble.com/Empty-clusters-in-k-means-possible-solution-td4667114.html

Broadly speaking, the above error is generated when the centroids don't match with the data.

It can happen when k is a number: due to random starts of the k-means algorithm, there is a possibility that the centres do not match with data

It may also happen when k represents the centroids (my case). The problem was: my data was scaled but my centroids were unscaled.

The above shared link made me realise that there is a bug in my code. Hope it will help someone in a similar situation as mine!

Upvotes: 0

Related Questions