Doug Fir
Doug Fir

Reputation: 21212

Add rnorm during dplyr pipeline but when setting the sd set it by groups

A dataframe:

mydf <- data.frame(
  x = rep(letters[1:3], 4),
  y = rnorm(12, 0, 3)
)

I can easily mutate a new column z that is the value of y plus or minus a random number:

mydf <- mydf %>% 
  mutate(z = rnorm(nrow(.), mean = 0, sd = sd(y)))

What I wouldlike to do is create z as a random number but when setting the sd use the sd for that letter only.

Tried:

mydf <- mydf %>% 
  group_by(x) %>% 
  mutate(z = rnorm(nrow(.), mean = 0, sd = sd(y)))
Error: Problem with `mutate()` input `z`.
x Input `z` can't be recycled to size 4.
ℹ Input `z` is `rnorm(nrow(.), mean = 0, sd = sd(y))`.
ℹ Input `z` must be size 4 or 1, not 12.
ℹ The error occurred in group 1: x = "a".

How can I add z, which is the value of y plus or minus a random number with an sd equal to that of the sd for the group as opposed to the column as a whole?

Upvotes: 1

Views: 329

Answers (1)

akrun
akrun

Reputation: 887088

Here the nrow(.) will break the group by attribute and get the entire row and mutate requires the length of the new the column to be the same as the number of rows of the earlier data. So, this will break that stream unless we wrap the column in a list which may not be what the OP wanted.

library(dplyr)
mydf %>% 
   group_by(x) %>% 
   summarise(n = nrow(.))
# A tibble: 3 x 2
#  x         n
#  <chr> <int>
#1 a        12 ###
#2 b        12 ###
#3 c        12 ###

We can use n()

mydf %>% 
  group_by(x) %>% 
   mutate(z = rnorm(n(), mean = 0, sd = sd(y)))   

Upvotes: 1

Related Questions