Grouped filter based on multiple columns

Question

I have a data frame with 71 groups, 4 observations per group, and 18 variables. I would like to remove the entire group if any observation in that group has less than a particular value in any of 4 different variables, all of which contain the same string in their name. Here's a simplified version:

df <- data.frame(group=letters[c(1, 1, 1, 2, 2, 2, 3, 3, 3)], 
                 var.one=c(111, 100, 98, 93, 99, 101, 100, 99, 97),
                 var.two=c(102, 96, 99, 100, 101, 102, 99, 90, 101),
                 other=c(seq(1:9)))

I would like to keep all members of any group where all variables that contain "var" are greater than 95, and drop all members of any group were any variable that contains "var" is less than 95. This should leave me with just group a:

group var.one var.two other
1     a     111     102     1
2     a     100      96     2
3     a      98      99     3

I can easily filter individual rows that match these conditions like this:

df %>% filter_at(vars(contains('var')), all_vars(. >=95))

But of course this does not remove the entire group associated with it. I can also easily exclude entire groups that don't match for a single variable:

df %>% group_by(group) %>% filter(!any(var.one <95))

But of course that only works for a single variable, not multiple variables.

How do I combine these two approaches?

MrFlick · Accepted Answer

With the latest version of dplyr, you can do

df %>% 
   group_by(group) %>% 
   filter(across(contains('var'), ~all(.>95)))

The across() function is basically the replacement for filter_at and all_vars in later releases. For more info about the function, consult the ?across help page.

Grouped filter based on multiple columns

Answers (2)

Related Questions