Georg Heiler
Georg Heiler

Reputation: 17724

filter column by count of distinct values

I have a data frame like:

my_tibble <- tibble::tibble(A=c('a','a','b','a','a','b', 'c'), B = c(1,0,0,1,1,1,1)) 

then, I can compute the percentage of A on column B

my_tibble%>%
  group_by(A) %>%
   summarise (percentage = mean(B)) %>%
   filter(percentage > 0)

How an I remove records like c which only constitute a single /too few observations to produce a meaningful percentage?

my_tibble %>%
  count(A) %>%
  mutate(prop = prop.table(n))

is a first try to identify these records. But I am unsure how to include this in a filter condition.

Upvotes: 0

Views: 463

Answers (1)

akuiper
akuiper

Reputation: 215117

You can add another column in the summarize to count the number of records per group and then filter based on it:

my_tibble %>% 
    group_by(A) %>% 
    summarise(percentage = mean(B), n = n()) %>% 
    filter(percentage > 0, n > 1)

# A tibble: 2 x 3
#      A percentage     n
#  <chr>      <dbl> <int>
#1     a       0.75     4
#2     b       0.50     2

Upvotes: 2

Related Questions