Reputation: 177
This question has been answered before, but solutions not working for my particular situation.
col1 | col2
A | 0
B | 1
A | 0
A | 1
B | 0
I'm basically looking for this:
col1 | col2 | Percentage
A | 0 | 0.67
A | 1 | 0.33
B | 0 | 0.50
B | 1 | 0.50
Both columns are factors. The following solutions is what I keep finding on other threads:
df %>% group_by(col1, col2) %>% summarise(n=n()) %>% mutate(freq = n / sum(n))
or something along those lines.
In fact, group_by doesn't really seem to be doing anything at all. It's not giving me an 'n' or 'freq' column. Don't know what I'm doing wrong. Is it because I'm working with factors? Also, if it's not obvious, the values provided in the columns are hypothetical.
Upvotes: 1
Views: 94
Reputation: 887971
An option would be to get the frequency count after grouping by 'col1', then with the 'col2' also as grouping column, divide that frequency by the already created frequency
library(dplyr)
df %>%
group_by(col1) %>%
mutate(n = n()) %>%
group_by(col2, add = TRUE) %>%
summarise(freq = n()/n[1])
# A tibble: 4 x 3
# Groups: col1 [2]
# col1 col2 freq
# <chr> <int> <dbl>
#1 A 0 0.667
#2 A 1 0.333
#3 B 0 0.5
#4 B 1 0.5
df <- structure(list(col1 = c("A", "B", "A", "A", "B"), col2 = c(0L,
1L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA, -5L
))
Upvotes: 1