Tyler Rinker
Tyler Rinker

Reputation: 109844

Efficient random re-ording of vectors

The use of a randomization test requires the user to randomly reorder some vector etc as a null model.

In my case I have a vector of 10,000 elements that I must resample from. Let's make that now:

x <- sample(c(TRUE, FALSE), 10000, TRUE)

So I have real data that looks like x. I want to randomly reorder vector x, n times. This can be accomplished:

lapply(1:1000, function(i) sample(x))

In this case 1000 replications takes:

start <- Sys.time()
lapply(1:1000, function(i) sample(x))
Sys.time() - start

Time difference of 10.20258 secs

Now consider that some additional computation must take place and this is for one cell in a distance matrix. Now multiply this overhead by i x j matrix and it gets time consuming. Is there a faster way to reshuffle the x vector (preferably in base R) n times? I use a list structure but if a matrix structure is more efficient I'm open to what ever. In my list the individual elements have the exact same proportion of TRUE/FALSE as the original x. This is key for the randomization test.

Upvotes: 1

Views: 79

Answers (2)

Sven Hohenstein
Sven Hohenstein

Reputation: 81683

In most cases, vapply is faster than lapply. You can also consider replicate for simple replication since all samplings are independent of i:

fun1 <- function() lapply(1:1000, function(i) sample(x))
fun2 <- function() vapply(1:1000, function(i) sample(x), FUN.VALUE = x)
fun3 <- function() replicate(1000, sample(x), simplify = FALSE)

library(microbenchmark)
microbenchmark(fun1(), fun2(), fun3())

Unit: milliseconds
   expr      min       lq   median       uq       max neval
 fun1() 363.3359 387.9058 531.3358 731.9839  9850.098   100
 fun2() 403.4411 469.3090 587.7403 747.8655 15495.549   100
 fun3() 363.2694 374.1643 516.9334 600.4151  6231.890   100

 # Note that `vapply` returns a matrix, not a list.

The function replicate seems to be slightly more efficient for this task.

Upvotes: 4

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

Printing in R can be slow (not to mention that not everything always gets printed to the screen).

Try assigning the output instead:

> start <- Sys.time()
> out <- lapply(1:1000, function(i) sample(x))
> Sys.time() - start
Time difference of 0.7525001 secs

Upvotes: 4

Related Questions