Reputation: 199
[EDIT: Included reproducible example using dput]
I want to filter out by a grouping variable if any of a vector of values in a variable exist in the group, i.e. there may be no groups where any variable is (Value V) V1, V2 or V3.
E.g. There may remain no groups wherein a tree which is susceptible to disease strain/ injury type V1, V2 or V3 is present.
However, my calls keep being interpreted: There may remain no trees in a group which is susceptible to disease strain/injury type V1, V2 or V3.
Example 1:
df %>% group_by(tree_group) %>%
filter(any(!(tree_condition %in% c("Significant","Severe","Dead or dying")))) %>%
filter(any(!(injury_type_1 %in% c("Significant","Severe","Dead or dying")))) %>%
filter(any(!(injury_type_2 %in% c("Significant","Severe","Dead or dying")))) %>%
filter(any(!(injury_type_3 %in% c("Significant","Severe","Dead or dying"))))
Example 2:
df %>% group_by(tree_group) %>%
filter(!(any(tree_condition %in% c("Significant","Severe","Dead or dying")))) %>%
filter(!(any(injury_type_1 %in% c("Significant","Severe","Dead or dying")))) %>%
filter(!(any(injury_type_2 %in% c("Significant","Severe","Dead or dying")))) %>%
filter(!(any(injury_type_3 %in% c("Significant","Severe","Dead or dying"))))
Both example 1 and example 2 yield the same result - damaged trees being removed from the groups, not incident groups being removed from the call.
I also tried, without succeeding, to create a variable (DAMAGED) to mark all trees in the group with 1, for susceptible if one member was susceptible, and else 0:
df %>%
group_by(tree_group) %>% mutate(if (tree_condition %in% c("Significant","Severe","Dead or dying")){
DAMAGED=1
} else if(injury_type_1 %in% c("Significant","Severe","Dead or dying")) {
DAMAGED=1
} else if(injury_type_2 %in% c("Significant","Severe","Dead or dying")) {
DAMAGED=1
} else if(injury_type_3 %in% c("Significant","Severe","Dead or dying")) {
DAMAGED=1
} else {
DAMAGED=0
})
However, this throws an condition as:
1: In if (tree_condition %in% c("Significant", "Severe", "Dead or dying")) { :
the condition has length > 1 and only the first element will be used
2: In if (injury_type_1 %in% c("Significant", "Severe", "Dead or dying")) { :
the condition has length > 1 and only the first element will be used
3: In if (injury_type_2 %in% c("Significant", "Severe", "Dead or dying")) { :
the condition has length > 1 and only the first element will be used
4: In if (injury_type_3 %in% c("Significant", "Severe", "Dead or dying")) { :
the condition has length > 1 and only the first element will be used
Example data:
df <- structure(list(tree_id = c("F41030808008", "F41030808008", "F41030808008", "F41030808008", "F41031302007", "F41031302007", "F41031302007", "F41031302007", "F41031302007", "F41031302007", "F41030808008", "F41030808008"), Siteid = c("F410", "F410", "F410", "F410", "F410", "F410", "F410", "F410", "F410", "F410", "F410", "F410"), injury_type_1 = c(NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Dead or dying"), injury_type_2 = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), injury_type_3 = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), height = c(NA, NA, 122, 5, NA, 35, 185, 245, 300, 102, NA, NA)), row.names = c(NA, -12L), class = "data.frame")
The result I want:
expected_result <– structure(list(tree_id = c("F41031302007", "F41031302007", "F41031302007", "F41031302007", "F41031302007", "F41031302007"), Siteid = c("F410", "F410", "F410", "F410", "F410", "F410"), injury_type_1 = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), injury_type_2 = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), injury_type_3 = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), height = c(NA, 35, 185, 245, 300, 102)), row.names = c(NA, -6L), class = "data.frame")
Upvotes: 0
Views: 627
Reputation: 199
Solved:
I adapted the following from an answer in another thread.
How to remove groups of observation with dplyr::filter()
My final code:
df %>% group_by(tree_group) %>%
filter(all(!(tree_condition %in% c("Significant","Severe","Dead or dying")))) %>%
filter(all(!(injury_type_1 %in% c("Significant","Severe","Dead or dying")))) %>%
filter(all(!(injury_type_2 %in% c("Significant","Severe","Dead or dying")))) %>%
filter(all(!(injury_type_3 %in% c("Significant","Severe","Dead or dying"))))
This code removes all of the observations in a group with a variable with one or more values of "Significant", "Severe", "Dead or dying" from the filtered set.
Upvotes: 0
Reputation: 887991
We can use filter_at
library(dplyr)
df %>%
group_by(tree_group)
filter_at(vars(tree_condition, matches('^injury_type_\\d+$')),
any_vars(!. %in% c("Significant","Severe","Dead or dying")))
Upvotes: 1