how to filter by a combination of two conditions

Question

I have a survey of households in various cities in every state. Some cities only have a few respondents, so I want to drop the cities with 5 or fewer. I tried the code below, but there are some cities with the same name in multiple states (Paris, Idaho has only 2 respondents but Paris, Texas has 13).

How can I filter out Paris, Idaho but not Paris, Texas?

city_tally <- scores %>%
group_by(state, city) %>%
tally()

enough_samples <- city_tally %>%
filter(n>5) %>%
select(state, city, n)

scores <- scores %>%
group_by(state) %>%
filter(city %in% enough_samples$city)

akrun · Accepted Answer

One option where we can do this in a chain would be - after grouping by 'state', 'city', create the frequency column ('n') with mutate, then by grouping with 'state' do the filter based on the 'n'

library(dplyr)
scores %>% 
   group_by(state, city) %>%
   mutate(n = n()) %>%
   group_by(state) %>% 
   filter(n > 5) %>%
   select(-n) # if it is not required to have the 'n' column

-output (based on @Brandon's reproducible example

# A tibble: 13 x 3
# Groups: state [2]
#   city   state    scores
#        
# 1 Paris  Texas     4.73 
# 2 Paris  Texas     0.657
# 3 Paris  Texas     5.32 
# 4 Paris  Texas     0.718
# 5 Paris  Texas     6.95 
# 6 Paris  Texas     6.30 
# 7 Yew    Maryland -3.96 
# 8 Yew    Maryland  6.48 
# 9 Yew    Maryland  3.78 
#10 Yew    Maryland  3.38 
#11 Yew    Maryland -1.88 
#12 Yew    Maryland  2.09 
#13 Yew    Maryland  5.67

how to filter by a combination of two conditions

Answers (2)

Related Questions