Reputation: 1751
Let's say you have a function that takes a number as an input and outputs a vector. However, the output vector size depends on the input and you can't calculate it before the function.
For example, take the 3N+1 famous algorithm. A simple implementation of that algorithm, returning the whole path until 1 could look like this:
compute <- function(x) {
if (x %% 2 == 0)
return(x / 2)
return(3*x + 1)
}
algo <- function(x) {
if (x == 1)
return(1)
output <- x
while(x != 1) {
x <- compute(x)
output <- c(output, x)
}
return(output)
}
The algo
function returns the whole path of an input X to 1, according to the function. As you can tell, the output
variable grows dynamically, using the c()
(combine) function.
Are there any alternatives to this? Is growing a list faster? Should I adopt some classic dynamic vector logic, such as initializing an empty N-sized vector and double it everytime it goes full?
EDIT: Please don't mind trying to optimize the way my helper functions are structured. I get it, but that's not the point here! I am only concerned about the c()
function and an alternative to it.
Upvotes: 4
Views: 847
Reputation:
So, what bothers you is reallocation, and you are right. Let's see.
library(microbenchmark)
microbenchmark({
a <- c()
for (i in seq(1e4)) {
a <- c(a, i)
}
})
microbenchmark({
a <- numeric(1e4)
for (i in seq(1e4)) {
a[[i]] <- i
}
})
microbenchmark({
a <- numeric(1)
k <- 1
for (i in seq(1e4)) {
if (i > k) {
a <- c(a, numeric(k))
k <- k + k
}
a[[i]] <- i
}
a <- head(a, 1e4)
})
And the timings:
Append
min lq mean median uq max neval
78.0162 78.67925 83.36224 79.54515 81.79535 166.6988 100
Preallocate
min lq mean median uq max neval
1.484901 1.516051 1.567897 1.5552 1.569451 1.895601 100
Amortize
min lq mean median uq max neval
3.316501 3.377201 3.62415 3.484351 3.585701 11.7596 100
Never append many elements to a vector. If possible, preallocate, otherwise amortized allocation will do.
Even if you don't know the actual size beforehand, you may have an upper bound. Then you can still preallocate and truncate in the end. Even a reasonable estimate is useful: preallocate that size, and then resort to amortization if needed.
A remark: R is not good at loops. For small loops, for instance over variables of a dataframe or files in a directory, there is usually no problem. But if you have a long computation that really needs to be achieved with many loops and you can't vectorize, R might not be the right tool. On occasions, writing a function in C, C++, Fortran or Java could help: it's fairly easy to build plugins or to use Rcpp, and the performance gain is considerable.
Upvotes: 1
Reputation: 101818
As per your edit, maybe you can check the following solution
algo_TIC2 <- function(x) {
res <- x
repeat {
u <- tail(res, 1)
if (u != 1) {
res[length(res) + 1] <- if (u %% 2) 3 * u + 1 else u / 2
} else {
return(res)
}
}
}
You can use recursions like below
compute <- function(x) if (x %% 2) 3*x + 1 else x / 2
algo_TIC1 <- function(x) {
if (x == 1) {
return(1)
}
c(x, algo_TIC1(compute(x)))
}
and you will see
> algo_TIC1(3000)
[1] 3000 1500 750 375 1126 563 1690 845 2536 1268 634 317 952 476 238
[16] 119 358 179 538 269 808 404 202 101 304 152 76 38 19 58
[31] 29 88 44 22 11 34 17 52 26 13 40 20 10 5 16
[46] 8 4 2 1
If you don't want any helper function, i.e., compute
, you can try
algo_TIC1 <- function(x) {
if (x == 1) {
return(1)
}
c(x, algo_TIC1(if (x %% 2) 3*x + 1 else x / 2))
}
Upvotes: 1
Reputation: 44897
You can set the length of a vector and then make assignments to specific elements. Your code would look like this:
algo2 <- function(x) {
if (x == 1)
return(1)
output <- x
index <- 1
while(x != 1) {
x <- compute(x)
index <- index + 1
if (index > length(output))
length(output) <- 2*length(output)
output[index] <- x
}
return(output[seq_len(index)])
}
This makes a difference, though not a big one in your example, because all those calls to compute()
(and return()
!) are quite costly. If you folded that calculation into algo
you'd see more improvement. You could also initialize output
to a length that is likely to be good enough for most cases, and rarely need doubling.
Upvotes: 0