user13267770
user13267770

Reputation:

Compute population sd for grouped variables

This is my tibble:

df <- tibble(x = c("a", "a", "a", "b", "b", "b"), y = c(1,2,3,4,6,8))
df
# A tibble: 6 x 2
  x         y
  <chr> <dbl>
1 a         1
2 a         2
3 a         3
4 b         4
5 b         6
6 b         8

I want to compute the poulation sd for the grouped variables of x.

I tried it with this function:

sqrt((n-1)/n) * sd(x)

and dplyr and it looked like this:

df %>%
  group_by(x) %>%
  summarise(sd = sqrt((length(df$y)-1)/length(df$y)) * sd(y)) %>%
  ungroup()

# A tibble: 2 x 2
  x        sd
* <chr> <dbl>
1 a     0.913
2 b     1.83 

Ofcourse this is incorrect, since the length argument is not grouped and therefore takes n = 6 and not n = 3. I should get

a = 0.8164966
b = 1.632993

Edit:

The output should be a tibble with the variables i have grouped and the sd for every group.

Upvotes: 0

Views: 36

Answers (1)

Sirius
Sirius

Reputation: 5429

You can use the n() function

df %>%
    group_by(x) %>%
    summarise(sd = sqrt(( n() -1)/ n() ) * sd(y)) %>%
    ungroup()

Upvotes: 0

Related Questions