Reputation: 4272
I'm trying to generate many random datasets with a specified number of rows and store them in a list. I could use a for
loop but I'm trying to figure out how to do this with apply()
.
My inclination is to initialize an empty list, then assign randomly-generated dataframes to each element using lapply()
but I'm not sure how to specify the number of rows for each data frame using a vector of numeric values. Minimal working example with pseudo-code for the last step below.
I'm particularly interested in base R solutions for various reasons.
# Store 20 dataframes with [1,50000] rows in list
n_df <- 20
df_rows <- sample(1:50000, n_df)
df_list <- vector(mode = "list", length = n_df)
# Not sure how to pass each value of df_rows to rnorm,
# currently just generates 20 random values per data
# frame instead of the number of rows specified in
# each element of df_rows.
df_list <- lapply(df_list, function(df){ df <- data.frame(z = rnorm(df_rows))})
Upvotes: 0
Views: 201
Reputation: 887901
We could do this more easily with replicate
instead of creating an empty list
df_list <- replicate(n_df,
data.frame(z = rnorm(sample(1:50000, 1))), simplify = FALSE)
Or similar option with rerun
library(purrr)
df_list <- rerun(n_df, tibble(z = rnorm(sample(1:50000, 1))))
Or another option is Vectorize
the rnorm
to select a vector of 'n', loop over the list
created with lapply
and convert the vector to a
data.frame`
lapply(Vectorize(rnorm)(df_rows), function(x) data.frame(z = x))
Or another option is to get the rnorm
of the total number of elements and then do a split`
v1 <- rnorm(sum(df_rows))
i1 <- cumsum(df_rows)
Map(function(i, j) data.frame(z = v1[i:j]), c(1, i1[-length(i1)]),
c(i1[-1]+1, length(v1)))
Or use a for
loop as the OP already initiated a NULL list
of length 'ndf'
for(i in seq_along(df_list)) df_list[[i]] <- data.frame(z = rnorm(df_rows[i]))
Or an option with tidyverse
, where we loop over the values of 'df_rows' in map
, get the rnorm
based on that 'n' value, convert to a tibble
library(purrr)
map(df_rows, ~ tibble(z = rnorm(.x)))
[[1]]
# A tibble: 43,497 x 1
z
<dbl>
1 2.72
2 0.217
3 -0.695
4 0.0398
5 -1.62
6 0.474
7 -0.763
8 -0.489
9 0.0898
10 2.42
# … with 43,487 more rows
[[2]]
# A tibble: 20,681 x 1
z
<dbl>
1 0.720
2 -0.704
3 1.72
4 -0.402
5 -2.38
6 -0.192
7 0.780
8 -1.87
9 0.734
10 -1.60
# … with 20,671 more rows
#...
Upvotes: 1
Reputation: 2945
n_df <- 20
df_rows <- sample(1:50000, n_df)
df_list <- lapply(1:n_df, function(x){
data.frame(z = rnorm(df_rows[[x]]))
})
You can also do this without pre-sampling the number of rows in each (if desired):
df_list <- lapply(1:n_df, function(x){
data.frame(z = rnorm(sample(1:50000, 1)))
})
As Onyambu suggested below, this can be further simplified to:
df_list <- lapply(df_rows, function(x){
data.frame(z = rnorm(x))
})
Upvotes: 3