Reputation: 11657
Let's say I have the following summary of pilot data:
pilot_data = read.table(text = "pairing male dv_mean dv_sd
AA 0 1.4377551 11.99576
AA 1 0.1745918 10.03553
AB 0 12.6574286 17.76540
AB 1 9.5337037 13.92486
BA 0 8.8971111 16.49538
BA 1 8.8706557 17.13532
BB 0 1.6339286 12.72830
BB 1 -0.1433333 13.68828", header = T)
I'd like to create a simulated dataset in dplyr
for each pairing, male
combination that has the same mean and standard deviation as that cell. So, for example, if I wanted to have 300 rows for each pairing, male
combination, I'd do something like:
tester = pilot_data %>% group_by(pairing, male) %>%
mutate(simulated_data = rnorm(mean = dv_mean, sd = dv_sd, n = 300))
Except this obviously won't work because of a recycling error. I can use a for loop to do this and append a dataset to itself over and over again, but I'm trying to learn how to do this in a dplyr chain.
What's the best way to achieve this?
Upvotes: 1
Views: 71
Reputation: 101343
Here is a data.table
option
> setDT(pilot_data)[, .(simulated_data = rnorm(300, dv_mean, dv_sd)), .(pairing, male)]
pairing male simulated_data
1: AA 0 -11.068416
2: AA 0 -4.925878
3: AA 0 -11.044629
4: AA 0 -7.946300
5: AA 0 3.352702
---
2396: BB 1 8.966713
2397: BB 1 -14.925273
2398: BB 1 -11.957720
2399: BB 1 17.335359
2400: BB 1 17.824735
Upvotes: 1
Reputation: 887118
We can use summarise
instead of mutate
as summarise
can return more than 1 row per group whereas mutate
is strict in returning the same length
as original number of rows
library(dplyr)
pilot_data %>%
group_by(pairing, male) %>%
summarise(simulated_data = rnorm(mean = dv_mean,
sd = dv_sd, n = 300), .groups = 'drop')
NOTE: Also, the number of rows per group is all 1. So, it works because rnorm
requires single value for mean
, sd
Or another option is to use rowwise
, return a list
column and then unnest
(in case there are duplicate rows for groups)
library(tidyr)
pilot_data %>%
rowwise %>%
mutate(simulated_data = list(rnorm(mean = dv_mean, sd = dv_sd,
n = 300))) %>%
unnest(c(simulated_data))
Upvotes: 2