Reputation: 53
I need a help to know how to find the optimal number of number of clusters using k-means cluster in R.
My code is
library(cluster)
library(factoextra)
#read data
data<-read.csv("..\file.txt",header=FALSE, sep=" ")
#determine number of clusters to use
k.max<- 22
wss <- sapply(2:k.max, function(k){kmeans(data, k, nstart=10 )$tot.withinss})
print(wss)
plot(2:k.max, wss, type="b", pch = 19, xlab="Number of clusters K", ylab="Total within-clusters sum of squares")
fviz_nbclust(data, kmeans, method = "wss") + geom_vline(xintercept = 3, linetype = 2)
I get the plot, but I still do not know how to find the number?
Thanks
Upvotes: 3
Views: 10858
Reputation: 41
n_clust<-fviz_nbclust(df, kmeans, method = "silhouette",k.max = 30)
n_clust<-n_clust$data
max_cluster<-as.numeric(n_clust$clusters[which.max(n_clust$y)])
Upvotes: 4
Reputation: 77454
There is no sound mathematical definition of the "elbow" (because of having different scales on x and y, there is no angle), and in plots like yours there probably is no "elbow" at all.
Most likely, k-means did not work for any k. This happens quite often. For example if your data doesn't contain clusters.
Try generating uniform data, and do the same plot - it will look similar.
Upvotes: 0