hoho
hoho

Reputation: 107

Set rnorm parameters equal to vector

I have a data frame that contains columns of sample sizes, means, and standard deviations, as well as a target value:

ssize <- c(200, 300, 150)
mean <- c(10, 40, 50)
sd <- c(5, 15, 65)
target <- c(7, 23, 30)
df <- data.frame(ssize, mean, sd, target)

I wish to add another variable below that returns the number of elements less than the target value, as drawn from a normal distribution with parameters mean and sd and sample size ssize. However, I cannot get rnorm to use the values of each row as parameters. For example, running

df$below <- sum(rnorm(df$ssize, df$mean, df$sd) < df$target)

generates distributions that have sample sizes equal to length(df$ssize) instead of the value of df$ssize itself.

Updated: data table solution for large datasets?

The solutions from @alistaire and @G5W work well, but I would like to extract the mean value of below from 100 replicates of rnorm, for each row. I tried both solutions:

df <- df %>% mutate(below = mean(replicate(100, pmap_int(., ~sum(rnorm(..1, ..2, ..3) < ..4)))))

df$below <- with(df, sapply(1:nrow, function(i) mean(replicate(100, sum(rnorm(n[i], mean[i], sd[i]) < target[i])))))

But they will take a very long time to run with my dataset, which has >4.3m rows. Is there a data table (or other) solution that might be faster?

Upvotes: 2

Views: 930

Answers (2)

G5W
G5W

Reputation: 37661

You can do this in base R with lapply and a temporary function

df$below = with(df,  
    sapply(1:3, function(i) sum(rnorm(ssize[i], mean[i], sd[i]) < target[i])))
df$below
[1] 44 45 48

Upvotes: 1

alistaire
alistaire

Reputation: 43354

List columns are a natural way to do this, so you can store the samples right next to the parameters that generated them. Using purrr for iteration,

library(tidyverse)
set.seed(47)    # for reproducibility

df <- data_frame(n = c(200, 300, 150),    # rename to name of parameter in rnorm so pmap works naturally
                 mean = c(10, 40, 50), 
                 sd = c(5, 15, 65), 
                 target = c(7, 23, 30))

df %>% 
    mutate(samples = pmap(.[1:3], rnorm),    # iterate in parallel over parameters and store samples as list column
           below = map2_int(samples, target, ~sum(.x < .y)))    # iterate over samples and target, calculate number below, and simplify to integer vector
#> # A tibble: 3 x 6
#>       n  mean    sd target samples     below
#>   <dbl> <dbl> <dbl>  <dbl> <list>      <int>
#> 1   200    10     5      7 <dbl [200]>    47
#> 2   300    40    15     23 <dbl [300]>    41
#> 3   150    50    65     30 <dbl [150]>    58

Upvotes: 2

Related Questions