Reputation:
This is my tibble:
df <- tibble(x = c("a", "a", "a", "b", "b", "b"), y = c(1,2,3,4,6,8))
df
# A tibble: 6 x 2
x y
<chr> <dbl>
1 a 1
2 a 2
3 a 3
4 b 4
5 b 6
6 b 8
I want to compute the poulation sd for the grouped variables of x.
I tried it with this function:
sqrt((n-1)/n) * sd(x)
and dplyr and it looked like this:
df %>%
group_by(x) %>%
summarise(sd = sqrt((length(df$y)-1)/length(df$y)) * sd(y)) %>%
ungroup()
# A tibble: 2 x 2
x sd
* <chr> <dbl>
1 a 0.913
2 b 1.83
Ofcourse this is incorrect, since the length argument is not grouped and therefore takes n = 6 and not n = 3. I should get
a = 0.8164966
b = 1.632993
Edit:
The output should be a tibble with the variables i have grouped and the sd for every group.
Upvotes: 0
Views: 36
Reputation: 5429
You can use the n()
function
df %>%
group_by(x) %>%
summarise(sd = sqrt(( n() -1)/ n() ) * sd(y)) %>%
ungroup()
Upvotes: 0