Get proportions of a dataframe grouped by multiple variables in R

Question

I have some dataframe like

df <- tribble(
  ~x, ~y, ~z,
  FALSE,"N",1,
  FALSE,"N",2,
  FALSE,"W",1,
  FALSE,"E",3,
  FALSE,"E",1,
  TRUE,"N",2,
  TRUE,"W",2,
  TRUE,"E",1
)

Now I want to group this by the first two variables, then attach the proportion column, so I tried

df %>%
  group_by(x,y) %>%
  summarize(count = n()) %>%
  mutate(prop = count/sum(count))

But I get

tribble(
  ~x, ~y, ~count, ~prop
  FALSE,"E", 2,   0.4
  FALSE,"N", 2,   0.4  
  FALSE,"W", 1,   0.2
  TRUE,"E", 1,   0.33
  TRUE,"N", 1,   0.33
  TRUE,"W", 1,   0.33
)

instead of

tribble(
  ~x, ~y, ~count, ~prop
  FALSE,"E", 2,   0.25
  FALSE,"N", 2,   0.25 
  FALSE,"W", 1,   0.125
  TRUE,"E", 1,   0.125
  TRUE,"N", 1,   0.125
  TRUE,"W", 1,   0.125
)

. Why does this happen?

Andrew Brown · Accepted Answer

When you group_by(x,y) then you get a grouped data frame by x and y. After summarize(), you get a data frame grouped by only x. You need an ungroup() before the mutate() to produce the result you want.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- tribble(
  ~x, ~y, ~z,
  FALSE,"N",1,
  FALSE,"N",2,
  FALSE,"W",1,
  FALSE,"E",3,
  FALSE,"E",1,
  TRUE,"N",2,
  TRUE,"W",2,
  TRUE,"E",1
)

df %>%
  group_by(x,y) %>%
  summarize(count = n()) %>%
  ungroup() %>%
  mutate(prop = count/sum(count))
#> `summarise()` regrouping output by 'x' (override with `.groups` argument)
#> # A tibble: 6 x 4
#>   x     y     count  prop
#>      
#> 1 FALSE E         2 0.25 
#> 2 FALSE N         2 0.25 
#> 3 FALSE W         1 0.125
#> 4 TRUE  E         1 0.125
#> 5 TRUE  N         1 0.125
#> 6 TRUE  W         1 0.125

^{Created on 2020-11-23 by the reprex package (v0.3.0)}

See also the summarize() .groups argument for more interesting options for how to handle multiple groups/levels. The number of rows per group matters.

Get proportions of a dataframe grouped by multiple variables in R

Answers (2)

Related Questions