Reputation: 117
I have some dataframe like
df <- tribble(
~x, ~y, ~z,
FALSE,"N",1,
FALSE,"N",2,
FALSE,"W",1,
FALSE,"E",3,
FALSE,"E",1,
TRUE,"N",2,
TRUE,"W",2,
TRUE,"E",1
)
Now I want to group this by the first two variables, then attach the proportion column, so I tried
df %>%
group_by(x,y) %>%
summarize(count = n()) %>%
mutate(prop = count/sum(count))
But I get
tribble(
~x, ~y, ~count, ~prop
FALSE,"E", 2, 0.4
FALSE,"N", 2, 0.4
FALSE,"W", 1, 0.2
TRUE,"E", 1, 0.33
TRUE,"N", 1, 0.33
TRUE,"W", 1, 0.33
)
instead of
tribble(
~x, ~y, ~count, ~prop
FALSE,"E", 2, 0.25
FALSE,"N", 2, 0.25
FALSE,"W", 1, 0.125
TRUE,"E", 1, 0.125
TRUE,"N", 1, 0.125
TRUE,"W", 1, 0.125
)
. Why does this happen?
Upvotes: 0
Views: 2258
Reputation: 388982
Another way without grouping would be to count
and then calculate proportions.
library(dplyr)
df %>% count(x, y) %>% mutate(n = n/sum(n))
# x y n
# <lgl> <chr> <dbl>
#1 FALSE E 0.25
#2 FALSE N 0.25
#3 FALSE W 0.125
#4 TRUE E 0.125
#5 TRUE N 0.125
#6 TRUE W 0.125
Upvotes: 2
Reputation: 1065
When you group_by(x,y)
then you get a grouped data frame by x
and y
. After summarize()
, you get a data frame grouped by only x
. You need an ungroup()
before the mutate()
to produce the result you want.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- tribble(
~x, ~y, ~z,
FALSE,"N",1,
FALSE,"N",2,
FALSE,"W",1,
FALSE,"E",3,
FALSE,"E",1,
TRUE,"N",2,
TRUE,"W",2,
TRUE,"E",1
)
df %>%
group_by(x,y) %>%
summarize(count = n()) %>%
ungroup() %>%
mutate(prop = count/sum(count))
#> `summarise()` regrouping output by 'x' (override with `.groups` argument)
#> # A tibble: 6 x 4
#> x y count prop
#> <lgl> <chr> <int> <dbl>
#> 1 FALSE E 2 0.25
#> 2 FALSE N 2 0.25
#> 3 FALSE W 1 0.125
#> 4 TRUE E 1 0.125
#> 5 TRUE N 1 0.125
#> 6 TRUE W 1 0.125
Created on 2020-11-23 by the reprex package (v0.3.0)
See also the summarize()
.groups
argument for more interesting options for how to handle multiple groups/levels. The number of rows per group matters.
Upvotes: 3