Reputation: 17724
I have a data frame like:
my_tibble <- tibble::tibble(A=c('a','a','b','a','a','b', 'c'), B = c(1,0,0,1,1,1,1))
then, I can compute the percentage of A on column B
my_tibble%>%
group_by(A) %>%
summarise (percentage = mean(B)) %>%
filter(percentage > 0)
How an I remove records like c
which only constitute a single /too few observations to produce a meaningful percentage?
my_tibble %>%
count(A) %>%
mutate(prop = prop.table(n))
is a first try to identify these records. But I am unsure how to include this in a filter condition.
Upvotes: 0
Views: 463
Reputation: 215117
You can add another column in the summarize to count the number of records per group and then filter based on it:
my_tibble %>%
group_by(A) %>%
summarise(percentage = mean(B), n = n()) %>%
filter(percentage > 0, n > 1)
# A tibble: 2 x 3
# A percentage n
# <chr> <dbl> <int>
#1 a 0.75 4
#2 b 0.50 2
Upvotes: 2