inmybrain
inmybrain

Reputation: 386

Gap between actual computation time and expected one in R

I have an iterative algorithm which repeats the same procedure based on previously updated parameters, and try to estimate the elapsed time for the whole algorithm.

Therefore, I measure a computation time (say, time1iter) for a single iteration, and estimate overall time by multiplying it with total iteration time (nIter * time1iter).

However, it has been found that huge difference appears between my estimate and actual time. For example, the estimated time is about 8 mins, but it takes less than 6 mins.

I wonder

  1. what has caused this gap in general, and
  2. how I can correctly estimate the elapsed time for iterative procedures.

I attach a toy example where you can find this "overestimation".

size <- 1000
nIter <- 100

## A single iteration
s_time <- Sys.time()
tmp <- matrix(rnorm(size^2), size, size)
ss <- 0
for(i in 1:size){
  for(j in 1:size){
    ss <- ss + tmp[i,j]
  }
}
time1iter <- difftime(Sys.time(), s_time, units = "secs")
cat(sprintf("Expected time for %d iterations is %3.f secs\n", 
            nIter, time1iter * nIter))

## Main iterations
s_time <- Sys.time()
for(iter in 1:nIter){
  tmp <- matrix(rnorm(size^2), size, size)
  ss <- 0
  for(i in 1:size){
    for(j in 1:size){
      ss <- ss + tmp[i,j]
    }
  }
}
cat(sprintf("Actual elapsed time is %.3f secs\n", 
            difftime(Sys.time(), s_time, units = "secs")))

A result that I had is

Expected time for 100 iterations is 17 secs

Actual elapsed time is 12.948 secs

Upvotes: 1

Views: 77

Answers (1)

dww
dww

Reputation: 31452

If we run the loop several times with increasing numbers of iterations, we get a pretty linear relation between time and number of iterations:

res = data.frame(nIter = seq(1,101,10), time=NA)
for (ni in 1:10){
  nIter <- res[ni, 'nIter']
  s_time <- Sys.time()
  for(iter in 1:nIter){
    tmp <- matrix(rnorm(size^2), size, size)
    ss <- 0
    for(i in 1:size){
      for(j in 1:size){
        ss <- ss + tmp[i,j]
      }
    }
  }
  res[ni, 'time'] <- difftime(Sys.time(), s_time, units = "secs")
}

library(ggplot2)
ggplot(res, aes(nIter, time)) +
  geom_smooth()

enter image description here

The small intercept is related things like overhead of interpreting the loop, getting and printing the time. In other words, this seems to behave much as one would expect

lm(time ~ nIter, data = res)    
Coefficients:
(Intercept)        nIter  
   0.009067     0.165585 

Upvotes: 2

Related Questions