Create simulated dataframe in dplyr from another dataframe

Question

Let's say I have the following summary of pilot data:

pilot_data = read.table(text = "pairing male dv_mean dv_sd
AA  0   1.4377551   11.99576    
AA  1   0.1745918   10.03553    
AB  0   12.6574286  17.76540    
AB  1   9.5337037   13.92486    
BA  0   8.8971111   16.49538    
BA  1   8.8706557   17.13532    
BB  0   1.6339286   12.72830    
BB  1   -0.1433333  13.68828", header = T)

I'd like to create a simulated dataset in dplyr for each pairing, male combination that has the same mean and standard deviation as that cell. So, for example, if I wanted to have 300 rows for each pairing, male combination, I'd do something like:

tester = pilot_data %>% group_by(pairing, male) %>%
  mutate(simulated_data = rnorm(mean = dv_mean, sd = dv_sd, n = 300))

Except this obviously won't work because of a recycling error. I can use a for loop to do this and append a dataset to itself over and over again, but I'm trying to learn how to do this in a dplyr chain.

What's the best way to achieve this?

akrun · Accepted Answer

We can use summarise instead of mutate as summarise can return more than 1 row per group whereas mutate is strict in returning the same length as original number of rows

 library(dplyr)
 pilot_data %>% 
     group_by(pairing, male) %>% 
     summarise(simulated_data = rnorm(mean = dv_mean, 
        sd = dv_sd, n = 300), .groups = 'drop')

NOTE: Also, the number of rows per group is all 1. So, it works because rnorm requires single value for mean, sd

Or another option is to use rowwise, return a list column and then unnest (in case there are duplicate rows for groups)

library(tidyr)
pilot_data %>%
   rowwise %>%
   mutate(simulated_data = list(rnorm(mean = dv_mean, sd = dv_sd,
         n = 300))) %>%
   unnest(c(simulated_data))

Create simulated dataframe in dplyr from another dataframe

Answers (2)

Related Questions