Reputation: 11
I'm currently working on an R script for the c-mean clustering method. I started with a rather simple version, to get the basic structure done. The idea is to cluster the values into n classes.
I have a vector of 8 values and I pick two to be my first candidates.
values <- c(4,8,12,5,9,30,75,13)
candidates <- c(values[1],values[6])
Then the elements of "values" shall be sorted for their distance from the candidates. I'm not sure if my version is the most elegant one but it seems to be working.
If the distance from one element to candidate one is smaller, it shall be sorted in group.1, and the other way around. In each case, the group that the value is not a part of, gets an NA.
After going through all the elements of "values" the mean value of each group is calculated and the process is repeated. In this case 10 times, because I added the loop.
The idea is, in the end you get the same values over and over again. Those values are the centers of the cluster.
group.2 <- 0
group.1 <- 0
for(j in 1:10){
for(i in 1:length(values)){
if( abs(candidates[1]-values[i]) < abs(candidates[2]-values[i]) ){
group.2[i] <- -999
group.1[i] <- values[i]
} else if( abs(candidates[1]-values[i]) > abs(candidates[2]-values[i]) ) {
group.1[i] <- -999
group.2[i] <- values[i]
}
}
group.1 <- group.1[!group.1==-999]
group.2 <- group.2[!group.2==-999]
candidates<- c(mean(group.1), mean(group.2))
print(candidates)
}
If you look at the output, you'll see that you actually get the final centers of the clusters after the second repetition.
What I can't figure out is how to make the loop stop, as soon as the results aren't changing anymore.
My idea is to add another loop which terminates the process as soon as
candidates[j]==candidates[j-1]
however I can't figure out how to access the previous value j-1 of the loop.
Upvotes: 0
Views: 804
Reputation: 132706
Better use vectorization and write a function:
values <- c(4,8,12,5,9,30,75,13)
candidates <- c(values[1],values[6])
cmeans <- function(values, candidates, maxiter=10, tol = .Machine$double.eps ^ 0.5, verbose=TRUE) {
for (j in seq_len(maxiter)) {
divide <- abs(candidates[1]-values) <= abs(candidates[2]-values)
group.1 <- values[divide]
group.2 <- values[!divide]
candidates.new<- c(mean(group.1), mean(group.2))
if (min(abs(candidates.new-candidates)) < tol) {
return(candidates.new)
} else {
if (verbose) message(paste(candidates.new, collapse=", "))
candidates <- candidates.new
}
}
}
cmeans(values, candidates)
#8.5, 52.5
#11.5714285714286, 75
#[1] 11.57143 75.00000
Upvotes: 1
Reputation: 7592
You will need to create a new variable, say old.candidates
at the beginning of the loop that is set equal to candidates
. Then, after setting candidates
, check equality and break if they are equal.
candidates <- 0 # You have to initialize it here
for(j in 1:10){
old <- candidates
# Do stuff
candidates <- c(mean(group.1), mean(group.2))
if(old - candidates == 0) break()
}
A better way would be to check if abs(old - candidates) < tol
for some small value of tol
.
Upvotes: 2