Computing Percentages of each Subgroup

Question

This question has been answered before, but solutions not working for my particular situation.

col1   |   col2
 A     |    0
 B     |    1
 A     |    0
 A     |    1
 B     |    0

I'm basically looking for this:

col1   |   col2   |   Percentage
 A     |    0     |      0.67
 A     |    1     |      0.33
 B     |    0     |      0.50
 B     |    1     |      0.50

Both columns are factors. The following solutions is what I keep finding on other threads:

df %>% group_by(col1, col2) %>% summarise(n=n()) %>% mutate(freq = n / sum(n))
or something along those lines.

In fact, group_by doesn't really seem to be doing anything at all. It's not giving me an 'n' or 'freq' column. Don't know what I'm doing wrong. Is it because I'm working with factors? Also, if it's not obvious, the values provided in the columns are hypothetical.

akrun · Accepted Answer

An option would be to get the frequency count after grouping by 'col1', then with the 'col2' also as grouping column, divide that frequency by the already created frequency

library(dplyr)
df %>% 
   group_by(col1) %>%
   mutate(n = n()) %>%
   group_by(col2, add = TRUE) %>% 
   summarise(freq = n()/n[1])
# A tibble: 4 x 3
# Groups:   col1 [2]
#  col1   col2  freq
#    
#1 A         0 0.667
#2 A         1 0.333
#3 B         0 0.5  
#4 B         1 0.5

data

df <- structure(list(col1 = c("A", "B", "A", "A", "B"), col2 = c(0L, 
1L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA, -5L
))

Computing Percentages of each Subgroup

Answers (1)

data

Related Questions