Franckess
Franckess

Reputation: 41

How to find the optimal number of clusters?

I know this question has already been asked, but I am failing to implement a decent plot for the following code:

options(digits=1)
set.seed(2014)

mydata <- matrix(seq(1,360),nrow=10,ncol=36)
wss <- c()
for (i in 1:19) wss[i] <- sum(kmeans(x=mydata,centers=seq(1,360,length.out=20)[i])$withinss)
plot(1:9, wss, type="b", xlab="Number of Clusters",
     ylab="Within groups sum of squares")

It produces the following error

Error in sample.int(m, k) : 
cannot take a sample larger than the population when 'replace = FALSE'

Upvotes: 0

Views: 1066

Answers (2)

Franckess
Franckess

Reputation: 41

Just a little spark in the dark!

options(digits=1)
set.seed(2014)

mydata <- seq(from=1,to=365)
wss <- c()
for (i in 5:15){
wss[i-4] <- sum(kmeans(mydata,centers=floor(seq(from=1,to=365,length.out=i)[-i]))$withinss)
}
plot(1:15,wss,type="b",xlab="Number of Clusters",ylab="Within groups sum of squares")

Does that make sense? @jlhoward @jbaums

Upvotes: 0

MrFlick
MrFlick

Reputation: 206536

kmeans assumes that each row is your data is an observation. So if you have k rows in x, the results of $clusters will be of lenth k. Here your test data has 10 rows. Yet you are specifying centers=20 when i=2 There is no way that 10 observations can have 20 different clusters.

Upvotes: 3

Related Questions