Ester Silva
Ester Silva

Reputation: 680

EM clustering instead of Kmeans

I have the following script that I can use to find the best number of the cluster using kmeans. How to change the following script using the EM clustering technique rather than kmeans.

reproducible example:

ourdata<- scale(USArrests)

Appreciate!

wss <- (nrow(ourdata)-1)*sum(apply(ourdata,2,var))
for (i in 2:10) wss[i] <- sum(kmeans(ourdata, 
                                      centers=i)$withinss)

plot(1:10, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares")

Upvotes: 0

Views: 226

Answers (1)

MHammer
MHammer

Reputation: 1314

The EMCluster package offers a variety of functions for running EM model-based clustering. An example of finding a solution with k = 3 clusters:

Update per OP's comment:

You can calculate the within sums of squares, along with other metrics of interest, using fpc::cluster.stats(). These can be extracted and plotted akin to your original post. As a reminder, "the elbow technique" as you described is an inaccurate description because the elbow technique is a general techinque and can and is used with any metric of choice. It is not only used for within sums of squares as in your original post.

library(EMCluster)
library(fpc)

ourdata<- scale(USArrests)
dist_fit <- dist(ourdata)

num_clusters <- 2:4

set.seed(1)
wss <- vapply(num_clusters, function(i_k) {
  em_fit <- em.EM(ourdata, nclass = i_k, lab = NULL, EMC = .EMC,
                  stable.solution = TRUE, min.n = NULL, min.n.iter = 10)
  cluster_stats_fit <- fpc::cluster.stats(dist_fit, em_fit$class)
  cluster_stats_fit$within.cluster.ss
}, numeric(1))

plot(num_clusters, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares")

Upvotes: 1

Related Questions