jon
jon

Reputation: 11366

loop in r until value converges and stores all the outputs

I would like to repeat the process unless a condition is met at the sometime storing the outcomes.

Here is a simple case where I know number of cycles to perform in the loop:

# just example data
smpls <- rnorm(100,50,50)

ncycles <- 1000
outm <- matrix(nrow=ncycles, ncol = 1)

# repeate the process for n cycles

for(i in 1:ncycles){    
outm[i]  <- mean(sample(smpls, 50))
}
# get average of outm
outm <- mean(sample(smpls, 50))

But my case is different in the sense that I do not know ncyles. I want to continue sampling unless the samples will get very low variance or converges (I guess it is "while" loop. For example unless vsd is less than 1 in following case.

vsd <- NULL
outm <- mean(sample(smpls, 50))
while (vsd > 1){
    outm[i] <- mean(sample(smpls, 50))
    vsd <- sd(outm)
     }

I do not know the value of i here to be set. Help appreciated

Edits:

smpls <- rnorm(100,50,50)
iter <- 0
# maximum iteration 
itermax <- 1000 
outm <- rep(NA, itermax)
vsd <- 2
while((vsd > 1 ) && (iter < itermax)) {
     outm[iter] <- mean(sample(smpls, 50))
     vsd <- sd(outm)
     iter <- iter+1
     }
Error in while ((vsd > 1) && (iter < itermax)) { : 
  missing value where TRUE/FALSE needed

Main idea of stopping when it reaches convergence is to save time. Although the above example with just mean function is quick, my original function need significant time to do iterations and I want to stop it when it converges.

Upvotes: 1

Views: 2820

Answers (3)

SHRram
SHRram

Reputation: 4227

Here is a solution:

data

set.seed(123) # so that you can replicate what I did 
smpls <- rnorm(100,50,50)

I think you need some initialization cycles (minimum iterations) so that your do get false convergence because you have small number of samples. So run few samples - say miniter. You also need a maximum iteration so that your loop do not became wild - say maxiter.

 meanconverge <- function (data, miniter, maxiter, tolerance){ 
      outm <- rep(NA, maxiter) 
     for(i in 1:miniter){    
     outm[i]  <- mean(sample(smpls, 50))
     }
     # sd of initial cycles 
    vsd <- sd(outm, na.rm = TRUE)
     if(vsd > tolerance) {
                   iter <- miniter+1
                   sdout <- rep(NA, maxiter)
                   while((vsd > tolerance ) && (iter < maxiter)) {
                     iter <- iter + 1
                     outm[iter] <- mean(sample(smpls, 50))
                     vsd <- sd(outm, na.rm = TRUE)
                     sdout[iter] <- vsd             
       } 
      out <- list(outm, sdout)
      return(out)
      } else {
      return(outm)
      }
      }

 out <- meanconverge  (data = smpls, miniter = 50, maxiter = 100000, tolerance = 3)
 plot(unlist(out[2]), pch = ".", col = "red") 

enter image description here

 plot(unlist(out[1]), pch = ".", col = "red")

enter image description here

Upvotes: 1

theWanderer4865
theWanderer4865

Reputation: 871

Checking for convergence is a tricky thing. A great way to get started is to look at how the value changes while you are computing. Convergence is all about getting arbitrarily close to a boundary; programmatically, you have to make a choice what "arbitrary" means. You also need to make a decision about how you will measure convergence.

To illustrate, suppose I want to know if my estimates for meeting my condition are getting really close to each other. I may have something like:

# inside my function or method that performs this convergence feat
while (while_condition && i < itermax)) {

    outcome[i] <- some_complicated_foo(bar)

    if ( abs(outcomes[i-1] - outcomes[i]) <= tolerance ) {
       while_condition <- FALSE # i.e. STOP LOOPING
       return outcomes
    } 

    else {continue}

}

Where tolerance is your definition of arbitrary closeness. Now, this seems like a hammer to your nail right? Well, what happens if you converge to the wrong answer? How will you know? Does this thing even converge? The trick to these kinds of problems is making cleaver guesses about your functions or the data generating process you are analyzing. But, having a maximum iteration boundary will definitely ease some computing time as long as it is reasonable. The real way to know if you are right is to use tests (like statistical tests or unit tests) to see if there is any 'garbage-in-garbage-out' or getting something different than you'd expect with a contrived example with a well known answer.

Check out optimization algorithms and see how they do it. See ?optim or some other optimization package to see how the pros do it.

Upvotes: 0

Dale
Dale

Reputation: 91

Two problems in your code:

1) you need sd(... , na.rm = TRUE)

2) you need to be sure that there are at least two numbers in outm for sd(outm, na.rm = TRUE) != NA

Just by the way, given the sd you specify to rnorm, I don't think you'll ever need more than a couple of dozen iterations

sim <- function() {
  smpls <- rnorm(100,50,5)
  itermax <- 1000
  outm <- rep(NA, itermax)
  outm[1] <- mean(sample(smpls, 50))
  iter <- 1
  vsd <- 2
  while((vsd > 1 ) && (iter < itermax)) {
       iter <- iter+1
       outm[iter] <- mean(sample(smpls, 50))
       vsd <- sd(outm, na.rm = TRUE)
       }

  iter
  }

set.seed(666)
iters <- replicate(100000, sim() )
range(iters)  # c(2, 11)

Cheers.

Upvotes: 2

Related Questions