eduardokapp
eduardokapp

Reputation: 1751

How to efficiently grow a vector in R?

Let's say you have a function that takes a number as an input and outputs a vector. However, the output vector size depends on the input and you can't calculate it before the function.

For example, take the 3N+1 famous algorithm. A simple implementation of that algorithm, returning the whole path until 1 could look like this:

compute <- function(x) {
  if (x %% 2 == 0)
    return(x / 2)
  return(3*x + 1)
}

algo <- function(x) {
  if (x == 1)
    return(1)

  output <- x
  while(x != 1) {
    x <- compute(x)
    output <- c(output, x)
  }

  return(output)
}

The algo function returns the whole path of an input X to 1, according to the function. As you can tell, the output variable grows dynamically, using the c() (combine) function.

Are there any alternatives to this? Is growing a list faster? Should I adopt some classic dynamic vector logic, such as initializing an empty N-sized vector and double it everytime it goes full?

EDIT: Please don't mind trying to optimize the way my helper functions are structured. I get it, but that's not the point here! I am only concerned about the c() function and an alternative to it.

Upvotes: 4

Views: 847

Answers (3)

user13963867
user13963867

Reputation:

So, what bothers you is reallocation, and you are right. Let's see.

library(microbenchmark)

microbenchmark({
  a <- c()
  for (i in seq(1e4)) {
    a <- c(a, i)
  }
})

microbenchmark({
  a <- numeric(1e4)
  for (i in seq(1e4)) {
    a[[i]] <- i
  }
})

microbenchmark({
  a <- numeric(1)
  k <- 1
  for (i in seq(1e4)) {
    if (i > k) {
      a <- c(a, numeric(k))
      k <- k + k
    }
    a[[i]] <- i
  }
  a <- head(a, 1e4)
})

And the timings:

Append
     min       lq      mean   median       uq      max neval
 78.0162 78.67925  83.36224 79.54515 81.79535 166.6988   100

Preallocate
     min       lq     mean    median       uq      max neval
1.484901 1.516051 1.567897    1.5552 1.569451 1.895601   100

Amortize
     min       lq     mean    median       uq      max neval
3.316501 3.377201  3.62415  3.484351 3.585701  11.7596   100

Never append many elements to a vector. If possible, preallocate, otherwise amortized allocation will do.

Even if you don't know the actual size beforehand, you may have an upper bound. Then you can still preallocate and truncate in the end. Even a reasonable estimate is useful: preallocate that size, and then resort to amortization if needed.


A remark: R is not good at loops. For small loops, for instance over variables of a dataframe or files in a directory, there is usually no problem. But if you have a long computation that really needs to be achieved with many loops and you can't vectorize, R might not be the right tool. On occasions, writing a function in C, C++, Fortran or Java could help: it's fairly easy to build plugins or to use Rcpp, and the performance gain is considerable.

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101818

Update

As per your edit, maybe you can check the following solution

algo_TIC2 <- function(x) {
  res <- x
  repeat {
    u <- tail(res, 1)
    if (u != 1) {
      res[length(res) + 1] <- if (u %% 2) 3 * u + 1 else u / 2
    } else {
      return(res)
    }
  }
}

You can use recursions like below

compute <- function(x) if (x %% 2) 3*x + 1 else x / 2
algo_TIC1 <- function(x) {
  if (x == 1) {
    return(1)
  }
  c(x, algo_TIC1(compute(x)))
}

and you will see

> algo_TIC1(3000)
 [1] 3000 1500  750  375 1126  563 1690  845 2536 1268  634  317  952  476  238
[16]  119  358  179  538  269  808  404  202  101  304  152   76   38   19   58
[31]   29   88   44   22   11   34   17   52   26   13   40   20   10    5   16
[46]    8    4    2    1

If you don't want any helper function, i.e., compute, you can try

algo_TIC1 <- function(x) {
  if (x == 1) {
    return(1)
  }
  c(x, algo_TIC1(if (x %% 2) 3*x + 1 else x / 2))
}

Upvotes: 1

user2554330
user2554330

Reputation: 44897

You can set the length of a vector and then make assignments to specific elements. Your code would look like this:

algo2 <- function(x) {
  if (x == 1)
    return(1)

  output <- x
  index <- 1
  while(x != 1) {
    x <- compute(x)
    index <- index + 1
    if (index > length(output))
      length(output) <- 2*length(output)
    output[index] <- x
  }

  return(output[seq_len(index)])
}

This makes a difference, though not a big one in your example, because all those calls to compute() (and return()!) are quite costly. If you folded that calculation into algo you'd see more improvement. You could also initialize output to a length that is likely to be good enough for most cases, and rarely need doubling.

Upvotes: 0

Related Questions