Reputation: 73397
I have a formula that creates matrices. Later with every single matrix of the set I have to do some time consuming stuff. So far, I'm bundling these matrices into a list with lapply()
. Now, I assume operating with an array of matrices would be much faster. The thing is, I don't know how to let the matirices be generated into an array as with lapply()
.
I give you this example:
# matrix generating function
mxSim <- function(X, n) {
mx = matrix(NA, nrow = n, ncol = 3,
dimnames = list(NULL, c("d", "alpha", "beta")))
mx[,1] = rbinom(n, 1, .375)
mx[,2] = rnorm(n, 0, 2)
mx[,3] = .42 * rnorm(n, 0, 6)
return(mx)
}
# bundle matrices together
mx.lst <- lapply(1:1e1, mxSim, n = 1e4)
# some stuff to be done after, like e. g.:
lapply(mx.lst, function(m) lm(d ~ alpha + beta, as.data.frame(m)))
Could anybody give me some advise how to do this with an array?
I've been looking into this answer, but for it the matrices have to be already generated, and I only could help me by listing them before again.
Upvotes: 0
Views: 54
Reputation: 73397
For the sake of completeness I just made some other benchmarks with n=1e3 as stated in the comment of @SeldomSeenSlim's answer. In addition I made it with a list of data.frames()
, and this was a bit surprising.
Here is the function for data.frames
, for matrix function see above.
dfSim <- function(X, n) {
d <- rbinom(n, 1, .375)
alpha <- rnorm(n, 0, 2)
beta <- .42 * rnorm(n, 0, 6)
data.frame(d, alpha, beta)
}
Bundeling
mx.lst <- lapply(1:1e3, mxSim, n = 1e4)
mx.array <- array(mx.lst, dim = c(2, 500))
df.lst <- lapply(1:1e3, dfSim, n = 1e4)
And the microbenchmarks:
some.fnc <- function(m) lm(d ~ alpha + beta, as.data.frame(m))
list.test <- microbenchmark(lapply(mx.lst, some.fnc))
array.test <- microbenchmark(apply(mx.array, MARGIN = c(1, 2), some.fnc))
df.list.test <- microbenchmark(lapply(df.lst, some.fnc))
Results
Unit: seconds
expr min lq mean median uq max neval
lapply 9.658568 9.742613 9.831577 9.784711 9.911466 10.30035 100
apply 9.727057 9.951213 9.994986 10.00614 10.06847 10.22178 100
lapply(df) 9.121293 9.229912 9.286592 9.277967 9.327829 10.12548 100
Now, what does us tell this?
But, okay, as a bold sidenote:
microbenchmark((lapply(1:1e3, mxSim, n = 1e4)), (lapply(1:1e3, dfSim, n = 1e4)))
expr min lq mean median uq max neval cld
(lapply(mxSim)) 2.533466 2.551199 2.563864 2.555421 2.559234 2.693383 100 a
(lapply(dfSim)) 2.676869 2.695826 2.718454 2.701161 2.706249 3.293431 100 b
Upvotes: 0
Reputation: 841
Enough with the hooha. Lets time it.
library(microbenchmark)
# matrix generating function
mxSim <- function(X, n) {
mx = matrix(NA, nrow = n, ncol = 3,
dimnames = list(NULL, c("d", "alpha", "beta")))
mx[,1] = rbinom(n, 1, .375)
mx[,2] = rnorm(n, 0, 2)
mx[,3] = .42 * rnorm(n, 0, 6)
return(mx)
}
# bundle matrices together
mx.lst <- lapply(1:1e1, mxSim, n = 1e4)
mx.array <- array(mx.lst,dim=c(2,5))
# some stuff to be done after, like e. g.:
#Timing...
some.fnc<-function(m)lm(d ~ alpha + beta, as.data.frame(m))
list.test<-microbenchmark(lapply(mx.lst, some.fnc))
array.test<-microbenchmark(apply(mx.array, MARGIN=c(1,2), some.fnc))
expr min lq mean median uq max neval
lapply: 74.8953 101.9424 173.8733 146.7186 234.7577 397.2494 100
apply: 77.2362 101.0338 174.4178 137.153 264.6854 418.7297 100
Naively applying a function over a list as opposed to an array is almost identical in actual performance.
Upvotes: 1