jay.sf
jay.sf

Reputation: 73397

How to generate matrices directely into an array with a function?

I have a formula that creates matrices. Later with every single matrix of the set I have to do some time consuming stuff. So far, I'm bundling these matrices into a list with lapply(). Now, I assume operating with an array of matrices would be much faster. The thing is, I don't know how to let the matirices be generated into an array as with lapply().

I give you this example:

# matrix generating function
mxSim <- function(X, n) {
  mx = matrix(NA, nrow = n, ncol = 3, 
              dimnames = list(NULL, c("d", "alpha", "beta")))
  mx[,1] = rbinom(n, 1, .375)
  mx[,2] = rnorm(n, 0, 2)
  mx[,3] = .42 * rnorm(n, 0, 6)
  return(mx)
}

# bundle matrices together
mx.lst <- lapply(1:1e1, mxSim, n = 1e4)

# some stuff to be done after, like e. g.:
lapply(mx.lst, function(m) lm(d ~ alpha + beta, as.data.frame(m)))

Could anybody give me some advise how to do this with an array?

I've been looking into this answer, but for it the matrices have to be already generated, and I only could help me by listing them before again.

Upvotes: 0

Views: 54

Answers (2)

jay.sf
jay.sf

Reputation: 73397

For the sake of completeness I just made some other benchmarks with n=1e3 as stated in the comment of @SeldomSeenSlim's answer. In addition I made it with a list of data.frames(), and this was a bit surprising.

Here is the function for data.frames, for matrix function see above.

dfSim <- function(X, n) {
  d <- rbinom(n, 1, .375)
  alpha <- rnorm(n, 0, 2)
  beta <- .42 * rnorm(n, 0, 6)
  data.frame(d, alpha, beta)
}

Bundeling

mx.lst <- lapply(1:1e3, mxSim, n = 1e4)
mx.array <- array(mx.lst, dim = c(2, 500))
df.lst <- lapply(1:1e3, dfSim, n = 1e4)

And the microbenchmarks:

some.fnc <- function(m) lm(d ~ alpha + beta, as.data.frame(m))
list.test <- microbenchmark(lapply(mx.lst, some.fnc))
array.test <- microbenchmark(apply(mx.array, MARGIN = c(1, 2), some.fnc))
df.list.test <- microbenchmark(lapply(df.lst, some.fnc))

Results

Unit: seconds
expr            min       lq     mean   median       uq      max neval
lapply     9.658568 9.742613 9.831577 9.784711 9.911466 10.30035   100
apply      9.727057 9.951213 9.994986 10.00614 10.06847 10.22178   100
lapply(df) 9.121293 9.229912 9.286592 9.277967 9.327829 10.12548   100

Now, what does us tell this?

But, okay, as a bold sidenote:

microbenchmark((lapply(1:1e3, mxSim, n = 1e4)), (lapply(1:1e3, dfSim, n = 1e4)))
           expr      min       lq     mean   median       uq      max neval cld
(lapply(mxSim)) 2.533466 2.551199 2.563864 2.555421 2.559234 2.693383   100  a 
(lapply(dfSim)) 2.676869 2.695826 2.718454 2.701161 2.706249 3.293431   100   b

Upvotes: 0

SeldomSeenSlim
SeldomSeenSlim

Reputation: 841

Enough with the hooha. Lets time it.

library(microbenchmark)
# matrix generating function
mxSim <- function(X, n) {
  mx = matrix(NA, nrow = n, ncol = 3, 
              dimnames = list(NULL, c("d", "alpha", "beta")))
  mx[,1] = rbinom(n, 1, .375)
  mx[,2] = rnorm(n, 0, 2)
  mx[,3] = .42 * rnorm(n, 0, 6)
  return(mx)
}

# bundle matrices together
mx.lst <- lapply(1:1e1, mxSim, n = 1e4)

mx.array <- array(mx.lst,dim=c(2,5))
# some stuff to be done after, like e. g.:

#Timing...
some.fnc<-function(m)lm(d ~ alpha + beta, as.data.frame(m))

list.test<-microbenchmark(lapply(mx.lst, some.fnc))

array.test<-microbenchmark(apply(mx.array, MARGIN=c(1,2), some.fnc))
 expr     min       lq     mean   median       uq      max neval
 lapply: 74.8953 101.9424 173.8733 146.7186 234.7577 397.2494   100
 apply:  77.2362 101.0338 174.4178 137.153  264.6854 418.7297   100

Naively applying a function over a list as opposed to an array is almost identical in actual performance.

Upvotes: 1

Related Questions