HelpNeeded3
HelpNeeded3

Reputation: 91

How to adjust this function to stop when the output is repeated in R?

Suppose we have the following functions: euclid calculates the Euclidean distance, and k_means implements the full k-means algorithm.

euclid <- function(points1, points2) {
  distanceMatrix <- matrix(NA, nrow=dim(points1)[1], ncol=dim(points2)[1])
  for(i in 1:nrow(points2)) {
    distanceMatrix[,i] <- sqrt(rowSums(t(t(points1)-points2[i,])^2))
  }
  distanceMatrix
}

k_means <- function(x, centers, distFun, nItter) {
  clusterHistory <- vector(nItter, mode="list")
  centerHistory <- vector(nItter, mode="list")
  
  for(i in 1:nItter) {
    distsToCenters <- distFun(x, centers)
    clusters <- apply(distsToCenters, 1, which.min)
    centers <- apply(x, 2, tapply, clusters, mean)
    # Saving history
    clusterHistory[[i]] <- clusters
    centerHistory[[i]] <- centers
  }
  
  list(clusters=clusterHistory, centers=centerHistory)
}

test=data # A data.frame
ktest=as.matrix(test) # Turn into a matrix
centers <- ktest[sample(nrow(ktest), 4),] # Sample some centers, 4 for example

result <- k_means(ktest, centers, euclid, 4) # 4 iterations
print(result)

When tested with a matrix of data, the output looks something like:

$clusters
$clusters[[1]]
  [1] 1 3 3 1 1 1 1 3 3 2 3 1 1 1 1 1 3 3 1 3 1 1 1 2 1 1 1 1 2 1 1 3 1 1 3 3 1 2 2 1 1 1 2 2 3 2 2 2
 [49] 2 2 1 3 1 3 1 3 2 3 1 3 3 2 3 2 1 2 3 1 3 1 1 2 3 1 3 1 3 2 1 3 1 3 2 1 1 2 2 1 1 1 1 1 2 1 3 3

$clusters[[2]]
  [1] 1 3 3 1 1 3 1 3 3 2 3 1 1 1 1 1 3 3 1 3 1 3 1 2 1 1 1 1 2 1 1 3 1 1 3 3 1 1 2 1 1 1 3 2 3 2 2 2
 [49] 3 2 3 3 1 3 1 3 2 3 1 3 3 2 3 2 3 2 3 1 3 3 1 1 3 1 3 1 3 2 1 3 3 3 3 1 1 2 2 1 3 1 1 1 2 1 3 3

$clusters[[3]]
  [1] 1 3 3 1 1 3 1 3 3 2 3 1 1 1 1 1 3 3 1 3 1 3 1 2 1 1 1 1 2 1 1 3 1 1 3 3 1 1 2 1 1 1 3 2 3 2 2 2
 [49] 3 2 3 3 1 3 1 3 2 3 1 3 3 2 3 2 3 2 3 1 3 3 1 1 3 1 3 1 3 2 1 3 3 3 3 1 1 2 2 1 3 1 1 1 2 1 3 3

$clusters[[4]]
  [1] 1 3 3 1 1 3 1 3 3 2 3 1 1 1 1 1 3 3 1 3 1 3 1 2 1 1 1 1 2 1 1 3 1 1 3 3 1 1 2 1 1 1 3 2 3 2 2 2
 [49] 3 2 3 3 1 3 1 3 2 3 1 3 3 2 3 2 3 2 3 1 3 3 1 1 3 1 3 1 3 2 1 3 3 3 3 1 1 2 2 1 3 1 1 1 2 1 3 3

And this continues up to (in this case) 4 iterations specified.

However, I'd like to edit the k_means function so that it stops when the iteration outputs are the same. You can see here that this occurs at $clusters[[3]] which is the same as $clusters[[2]]. However, $clusters[[4]] is still unnecessarily printed. Can anyone advise where to specifically edit this please?

Upvotes: 0

Views: 40

Answers (1)

Javier
Javier

Reputation: 457

Include a break statement as follows:

k_means <- function(x, centers, distFun, nItter) {
  clusterHistory <- vector(nItter, mode="list")
  centerHistory <- vector(nItter, mode="list")
  
  for(i in 1:nItter) {
    distsToCenters <- distFun(x, centers)
    clusters <- apply(distsToCenters, 1, which.min)
    centers <- apply(x, 2, tapply, clusters, mean)
    # Saving history
    clusterHistory[[i]] <- clusters
    centerHistory[[i]] <- centers
    if(i >1){
      if(identical(clusterHistory[[i]], clusterHistory[[i-1]])){break} #Stop if duplicated result
  }}
  
  list(clusters=clusterHistory, centers=centerHistory)
}

You can extend it to compare also the centerHistory if needed

Upvotes: 1

Related Questions