socialscientist
socialscientist

Reputation: 4272

R: Apply a function to a list with a vector of arguments

I'm trying to generate many random datasets with a specified number of rows and store them in a list. I could use a for loop but I'm trying to figure out how to do this with apply().

My inclination is to initialize an empty list, then assign randomly-generated dataframes to each element using lapply() but I'm not sure how to specify the number of rows for each data frame using a vector of numeric values. Minimal working example with pseudo-code for the last step below.

I'm particularly interested in base R solutions for various reasons.

# Store 20 dataframes with [1,50000] rows in list
n_df <- 20
df_rows <- sample(1:50000, n_df)
df_list <- vector(mode = "list", length = n_df)

# Not sure how to pass each value of df_rows to rnorm,
# currently just generates 20 random values per data
# frame instead of the number of rows specified in
# each element of df_rows.
df_list <- lapply(df_list, function(df){ df <- data.frame(z = rnorm(df_rows))})

Upvotes: 0

Views: 201

Answers (2)

akrun
akrun

Reputation: 887901

We could do this more easily with replicate instead of creating an empty list

df_list <- replicate(n_df,  
             data.frame(z = rnorm(sample(1:50000, 1))), simplify = FALSE)

Or similar option with rerun

library(purrr)
df_list <- rerun(n_df, tibble(z = rnorm(sample(1:50000, 1))))

Or another option is Vectorize the rnorm to select a vector of 'n', loop over the list created with lapply and convert the vector to a data.frame`

lapply(Vectorize(rnorm)(df_rows), function(x) data.frame(z = x))

Or another option is to get the rnorm of the total number of elements and then do a split`

v1 <- rnorm(sum(df_rows))
i1 <- cumsum(df_rows)
Map(function(i, j) data.frame(z = v1[i:j]), c(1, i1[-length(i1)]),
      c(i1[-1]+1, length(v1)))

Or use a for loop as the OP already initiated a NULL list of length 'ndf'

for(i in seq_along(df_list)) df_list[[i]] <- data.frame(z = rnorm(df_rows[i]))

Or an option with tidyverse, where we loop over the values of 'df_rows' in map, get the rnorm based on that 'n' value, convert to a tibble

library(purrr)
map(df_rows, ~ tibble(z = rnorm(.x)))
[[1]]
# A tibble: 43,497 x 1
         z
     <dbl>
 1  2.72  
 2  0.217 
 3 -0.695 
 4  0.0398
 5 -1.62  
 6  0.474 
 7 -0.763 
 8 -0.489 
 9  0.0898
10  2.42  
# … with 43,487 more rows

[[2]]
# A tibble: 20,681 x 1
        z
    <dbl>
 1  0.720
 2 -0.704
 3  1.72 
 4 -0.402
 5 -2.38 
 6 -0.192
 7  0.780
 8 -1.87 
 9  0.734
10 -1.60 
# … with 20,671 more rows
#...

Upvotes: 1

Brigadeiro
Brigadeiro

Reputation: 2945

n_df <- 20
df_rows <- sample(1:50000, n_df)

df_list <- lapply(1:n_df, function(x){
  data.frame(z = rnorm(df_rows[[x]]))
})

You can also do this without pre-sampling the number of rows in each (if desired):

df_list <- lapply(1:n_df, function(x){
  data.frame(z = rnorm(sample(1:50000, 1)))
})

As Onyambu suggested below, this can be further simplified to:

df_list <- lapply(df_rows, function(x){
  data.frame(z = rnorm(x))
})

Upvotes: 3

Related Questions