Reputation: 10966
In a simulation work, I have multiple datasets having the same column variables but different rows, and these datasets are simulated based on n
, the sample size for each dataset. For example:
n.vec = c(5, 10, 15, 20)
data1 = data.frame(x=rnorm(n.vec[1], 0, 1), y=rnorm(n.vec[1], 0, 1))
data2 = data.frame(x=rnorm(n.vec[2], 1, 2), y=rnorm(n.vec[2], 1, 2))
data3 = data.frame(x=rnorm(n.vec[3], 3, 4), y=rnorm(n.vec[3], 3, 4))
data4 = data.frame(x=rnorm(n.vec[4], 5, 6), y=rnorm(n.vec[4], 5, 6))
mega.data = rbind(data1, data2, data3, data4)
Now without using a loop (n.vec
may be of any length), is there an efficient way to row binding all the datasets into a single "mega dataset" like mega.data
above? Preferably using dplyr
package. Thanks.
Upvotes: 1
Views: 156
Reputation: 43354
A simple way is to collect your parameters in a data frame, and then use purrr::pmap
to iterate over its rows to produce the data frame you want. Its pmap_df
variant will call dplyr::bind_rows
on the results, binding them into a single data frame.
library(tidyverse)
set.seed(47)
mega_data <- data_frame(n = c(5, 10, 15, 20),
mean = c(0, 1, 3, 5),
sd = c(1, 2, 4, 6)) %>%
pmap_df(~list(x = rnorm(...),
y = rnorm(...)))
mega_data
#> # A tibble: 50 x 2
#> x y
#> <dbl> <dbl>
#> 1 1.99 -1.09
#> 2 0.711 -0.985
#> 3 0.185 0.0151
#> 4 -0.282 -0.252
#> 5 0.109 -1.47
#> 6 -0.845 -2.13
#> 7 1.08 1.50
#> 8 1.99 0.319
#> 9 -2.66 1.83
#> 10 1.18 0.347
#> # ... with 40 more rows
Upvotes: 2