user10939484
user10939484

Reputation: 177

Computing Percentages of each Subgroup

This question has been answered before, but solutions not working for my particular situation.

col1   |   col2
 A     |    0
 B     |    1
 A     |    0
 A     |    1
 B     |    0

I'm basically looking for this:

col1   |   col2   |   Percentage
 A     |    0     |      0.67
 A     |    1     |      0.33
 B     |    0     |      0.50
 B     |    1     |      0.50

Both columns are factors. The following solutions is what I keep finding on other threads:

df %>% group_by(col1, col2) %>% summarise(n=n()) %>% mutate(freq = n / sum(n))
or something along those lines.

In fact, group_by doesn't really seem to be doing anything at all. It's not giving me an 'n' or 'freq' column. Don't know what I'm doing wrong. Is it because I'm working with factors? Also, if it's not obvious, the values provided in the columns are hypothetical.

Upvotes: 1

Views: 94

Answers (1)

akrun
akrun

Reputation: 887971

An option would be to get the frequency count after grouping by 'col1', then with the 'col2' also as grouping column, divide that frequency by the already created frequency

library(dplyr)
df %>% 
   group_by(col1) %>%
   mutate(n = n()) %>%
   group_by(col2, add = TRUE) %>% 
   summarise(freq = n()/n[1])
# A tibble: 4 x 3
# Groups:   col1 [2]
#  col1   col2  freq
#  <chr> <int> <dbl>
#1 A         0 0.667
#2 A         1 0.333
#3 B         0 0.5  
#4 B         1 0.5  

data

df <- structure(list(col1 = c("A", "B", "A", "A", "B"), col2 = c(0L, 
1L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA, -5L
))

Upvotes: 1

Related Questions