jay.sf
jay.sf

Reputation: 73004

How to bind observations from a function with multiple arguments into rows?

I want to do a simulation to prove something. Therefore I created a function that generates one observation, very similar to this below.

set.seed(458007)

fun1 <- function(M, x, y){
  M <- matrix(NA, nrow = 1, ncol = 4)
  M[, 1] <- x
  M[, 2] <- y
  M[, 3] <- mean(rnorm(x, 0, y))
  M[, 4] <- mean(rnorm(x, 1, y))
  return(M)
}

fun1(x=1e3, y=.5)
#      [,1] [,2]        [,3]      [,4]
# [1,] 1000  0.5 0.001414806 0.9875602

For the simulation now I want to bind repeated observations with different arguments into rows and chose an lapply() approach. Though I brought following code to work, it is ironically the slowest one. Note that in a related question I slightly overminimalized the example of my problem which is actually somewhat nested. Perhaps anyone could help me to find a faster solution?

#  lapply
do.call(rbind, lapply(c(.5, 1, 1.5), function(y)
  do.call(rbind, lapply(c(1e3, 1e4, 1e5), 
         function(x) do.call(rbind, lapply(1:5, fun1, x, y))))))
#        [,1] [,2]          [,3]      [,4]
#  [1,] 1e+03  0.5  0.0156969547 0.9878933
#  [2,] 1e+03  0.5  0.0187202908 1.0011313
#  [3,] 1e+03  0.5 -0.0351017539 0.9953701
#  [4,] 1e+03  0.5 -0.0129749736 1.0112514
#  [5,] 1e+03  0.5 -0.0154776052 0.9793552
#  [6,] 1e+04  0.5 -0.0070121049 1.0022838
#  [7,] 1e+04  0.5 -0.0064961931 0.9967966
#  [8,] 1e+04  0.5 -0.0054208002 0.9955582
#  [9,] 1e+04  0.5 -0.0027074479 1.0019217
# [10,] 1e+04  0.5  0.0047017946 1.0069838
# [11,] 1e+05  0.5 -0.0018550320 0.9981459
# [12,] 1e+05  0.5 -0.0019201731 0.9973762
# [13,] 1e+05  0.5 -0.0031555017 1.0016808
# [14,] 1e+05  0.5 -0.0005508661 1.0001200
# [15,] 1e+05  0.5  0.0002928878 0.9991147
# [16,] 1e+03  1.0  0.0043441072 0.9579204
# [17,] 1e+03  1.0 -0.0059409534 1.0068553
# [18,] 1e+03  1.0  0.0850053171 1.0316056
# [19,] 1e+03  1.0 -0.0145192268 1.0193467
# [20,] 1e+03  1.0  0.0104437603 0.9959815
# [21,] 1e+04  1.0  0.0252303898 0.9968866
# [22,] 1e+04  1.0  0.0039449755 0.9818866
# [23,] 1e+04  1.0  0.0145974970 0.9814802
# [24,] 1e+04  1.0 -0.0016105680 0.9968357
# [25,] 1e+04  1.0  0.0058877101 1.0049794
# [26,] 1e+05  1.0  0.0015416062 1.0008094
# [27,] 1e+05  1.0  0.0004725605 1.0001917
# [28,] 1e+05  1.0 -0.0007963141 1.0019771
# [29,] 1e+05  1.0 -0.0007302225 0.9969158
# [30,] 1e+05  1.0  0.0023877190 1.0060436
# [31,] 1e+03  1.5  0.0165765473 0.9391917
# [32,] 1e+03  1.5 -0.0990828503 1.0256720
# [33,] 1e+03  1.5  0.0526152728 0.9981981
# [34,] 1e+03  1.5  0.1472273215 0.9442844
# [35,] 1e+03  1.5  0.0346540383 1.0316669
# [36,] 1e+04  1.5 -0.0007479431 0.9800219
# [37,] 1e+04  1.5  0.0189053160 1.0284075
# [38,] 1e+04  1.5  0.0062155928 0.9821324
# [39,] 1e+04  1.5 -0.0065533501 1.0085699
# [40,] 1e+04  1.5 -0.0161694486 1.0126392
# [41,] 1e+05  1.5 -0.0090145992 0.9952551
# [42,] 1e+05  1.5 -0.0024756213 1.0054282
# [43,] 1e+05  1.5  0.0061985946 0.9966108
# [44,] 1e+05  1.5  0.0023640342 0.9988624
# [45,] 1e+05  1.5  0.0014610948 0.9956877

Benchmark with @lefft's and @Parfait's solutions (from example)

# Unit: milliseconds
#       expr      min       lq     mean   median       uq      max neval
#       mine 325.8589 347.1616 405.6944 398.6682 434.9392 906.7906   100
#      lefft 327.6870 348.3504 393.7511 393.2127 421.4536 694.1610   100
#    Parfait 323.5595 343.5806 396.9973 390.9864 423.0759 736.2887   100

Upvotes: 1

Views: 58

Answers (1)

lefft
lefft

Reputation: 2105

Here is a nice compact way to do it:

# create the input data *before* applying the func to it 
dat <- expand.grid(Ms=1:5, xs=c(1e3, 1e4, 1e5), ys=c(.5, 1, 1.5))

# apply the function to each row of the data frame (`MARGIN=1` is row-wise) 
# (the outermost function `t()` transposes the result so it looks like yours)
t(apply(dat, MARGIN=1, FUN=function(row) fun1(M=row[1], x=row[2], y=row[3])))

It is about 10% faster than the original solution, but the difference might be different with bigger data. One thing to note is that if you create the param grid first (as is done here), then the time spent on computing with fun1() can be isolated more easily (i.e. you can tell what's taking a long time -- the computing or the creation of the input data frame).

Hope this helps!

Upvotes: 1

Related Questions