Reputation: 109844
The use of a randomization test requires the user to randomly reorder some vector etc as a null model.
In my case I have a vector of 10,000 elements that I must resample from. Let's make that now:
x <- sample(c(TRUE, FALSE), 10000, TRUE)
So I have real data that looks like x
. I want to randomly reorder vector x
, n
times. This can be accomplished:
lapply(1:1000, function(i) sample(x))
In this case 1000 replications takes:
start <- Sys.time()
lapply(1:1000, function(i) sample(x))
Sys.time() - start
Time difference of 10.20258 secs
Now consider that some additional computation must take place and this is for one cell in a distance matrix. Now multiply this overhead by i
x j
matrix and it gets time consuming. Is there a faster way to reshuffle the x
vector (preferably in base R) n
times? I use a list
structure but if a matrix structure is more efficient I'm open to what ever. In my list the individual elements have the exact same proportion of TRUE/FALSE as the original x
. This is key for the randomization test.
Upvotes: 1
Views: 79
Reputation: 81683
In most cases, vapply
is faster than lapply
. You can also consider replicate
for simple replication since all samplings are independent of i
:
fun1 <- function() lapply(1:1000, function(i) sample(x))
fun2 <- function() vapply(1:1000, function(i) sample(x), FUN.VALUE = x)
fun3 <- function() replicate(1000, sample(x), simplify = FALSE)
library(microbenchmark)
microbenchmark(fun1(), fun2(), fun3())
Unit: milliseconds
expr min lq median uq max neval
fun1() 363.3359 387.9058 531.3358 731.9839 9850.098 100
fun2() 403.4411 469.3090 587.7403 747.8655 15495.549 100
fun3() 363.2694 374.1643 516.9334 600.4151 6231.890 100
# Note that `vapply` returns a matrix, not a list.
The function replicate
seems to be slightly more efficient for this task.
Upvotes: 4
Reputation: 193517
Printing in R can be slow (not to mention that not everything always gets printed to the screen).
Try assigning the output instead:
> start <- Sys.time()
> out <- lapply(1:1000, function(i) sample(x))
> Sys.time() - start
Time difference of 0.7525001 secs
Upvotes: 4