Reputation: 45
I have a data frame with 71 groups, 4 observations per group, and 18 variables. I would like to remove the entire group if any observation in that group has less than a particular value in any of 4 different variables, all of which contain the same string in their name. Here's a simplified version:
df <- data.frame(group=letters[c(1, 1, 1, 2, 2, 2, 3, 3, 3)],
var.one=c(111, 100, 98, 93, 99, 101, 100, 99, 97),
var.two=c(102, 96, 99, 100, 101, 102, 99, 90, 101),
other=c(seq(1:9)))
I would like to keep all members of any group where all variables that contain "var" are greater than 95, and drop all members of any group were any variable that contains "var" is less than 95. This should leave me with just group a:
group var.one var.two other
1 a 111 102 1
2 a 100 96 2
3 a 98 99 3
I can easily filter individual rows that match these conditions like this:
df %>% filter_at(vars(contains('var')), all_vars(. >=95))
But of course this does not remove the entire group associated with it. I can also easily exclude entire groups that don't match for a single variable:
df %>% group_by(group) %>% filter(!any(var.one <95))
But of course that only works for a single variable, not multiple variables.
How do I combine these two approaches?
Upvotes: 1
Views: 293
Reputation: 389275
You can use if_all
:
library(dplyr)
df %>%
group_by(group) %>%
filter(if_all(starts_with('var'), ~all(.x > 95))) %>%
ungroup
# group var.one var.two other
# <chr> <dbl> <dbl> <int>
#1 a 111 102 1
#2 a 100 96 2
#3 a 98 99 3
Upvotes: 1
Reputation: 206566
With the latest version of dplyr
, you can do
df %>%
group_by(group) %>%
filter(across(contains('var'), ~all(.>95)))
The across()
function is basically the replacement for filter_at
and all_vars
in later releases. For more info about the function, consult the ?across
help page.
Upvotes: 2