Reputation: 835
I started coding in R lately, and I read that the apply function is faster than a for loop.
Let's say I want to extract numbers from a vector and insert them into a list. Using a for loop this not a problem. However, I'm curious if this is also possible with an apply function and if that makes sense in any way. I had something like this in mind (which is not working):
some.list <- list()
some.vector <- 1:10
sapply(1:10,function(i){some.list[[i]] <- some.vector[i]})
Upvotes: 0
Views: 228
Reputation: 44330
There are all sorts of different ways to create a list containing the elements of a vector (the one that I would always use would be as.list
). You can use R benchmarking packages to test for yourself which is faster:
fun1 <- function(v) as.list(v)
fun2 <- function(v) {
l <- vector("list", length(v)) # Thanks to @MrFlick for pre-allocation tip
for (i in seq_along(v)) {
l[[i]] <- v[i]
}
l
}
fun2a <- function(v) {
l <- vector("list", length(v)) # Thanks to @MrFlick for pre-allocation tip
sapply(seq_along(v), function(i) l[[i]] <<- v[i])
l
}
fun3 <- function(v) lapply(v, identity)
fun3a <- function(v) sapply(v, identity, simplify=FALSE)
fun4 <- function(v) unname(split(v, seq_along(v)))
v <- 1:10000
# Check if all return same thing (see http://stackoverflow.com/a/30850654/3093387)
all(sapply(list(fun2(v), fun2a(v), fun3(v), fun3a(v), fun4(v)), identical, fun1(v)))
# [1] TRUE
library(microbenchmark)
microbenchmark(fun1(v), fun2(v), fun2a(v), fun3(v), fun3a(v), fun4(v))
# Unit: microseconds
# expr min lq mean median uq max neval
# fun1(v) 139.543 178.5015 283.7498 218.720 288.1555 3730.439 100
# fun2(v) 6809.344 7465.1110 9326.7799 7912.763 10881.0305 16963.567 100
# fun2a(v) 10790.471 13786.2335 15912.5338 15089.547 15787.3085 71504.328 100
# fun3(v) 4132.854 4545.2085 6612.3504 4768.798 7947.0820 63608.519 100
# fun3a(v) 4147.731 4537.0010 5887.4457 4805.952 7604.4250 13613.517 100
# fun4(v) 3341.360 3508.2995 3798.4246 3599.220 3797.1200 7565.591 100
For a list of length 10000, as.list
is about 15x faster than lapply
, sapply
with simplify=FALSE
, or split
. In turn these three options 2-3x faster than a for loop or sapply
with a <<-
(using pre-allocated output lists; it is about 75x slower if we don't pre-allocate). In short, sapply
and for
had similar runtimes (sapply
actually appeared a bit slower), and both are much slower than vectorized functions for this operation.
Upvotes: 2