Silviculturalist
Silviculturalist

Reputation: 199

Dplyr, filtering groups if any value in vector list exists

[EDIT: Included reproducible example using dput]

I want to filter out by a grouping variable if any of a vector of values in a variable exist in the group, i.e. there may be no groups where any variable is (Value V) V1, V2 or V3.

E.g. There may remain no groups wherein a tree which is susceptible to disease strain/ injury type V1, V2 or V3 is present.

However, my calls keep being interpreted: There may remain no trees in a group which is susceptible to disease strain/injury type V1, V2 or V3.

Example 1:

df %>% group_by(tree_group) %>% 
  filter(any(!(tree_condition %in% c("Significant","Severe","Dead or dying")))) %>%
  filter(any(!(injury_type_1 %in% c("Significant","Severe","Dead or dying")))) %>%
  filter(any(!(injury_type_2 %in% c("Significant","Severe","Dead or dying")))) %>%
  filter(any(!(injury_type_3 %in% c("Significant","Severe","Dead or dying"))))

Example 2:

df  %>% group_by(tree_group) %>% 
  filter(!(any(tree_condition %in% c("Significant","Severe","Dead or dying")))) %>%
  filter(!(any(injury_type_1 %in% c("Significant","Severe","Dead or dying")))) %>%
  filter(!(any(injury_type_2 %in% c("Significant","Severe","Dead or dying")))) %>%
  filter(!(any(injury_type_3 %in% c("Significant","Severe","Dead or dying"))))

Both example 1 and example 2 yield the same result - damaged trees being removed from the groups, not incident groups being removed from the call.

I also tried, without succeeding, to create a variable (DAMAGED) to mark all trees in the group with 1, for susceptible if one member was susceptible, and else 0:

df %>% 
group_by(tree_group) %>% mutate(if (tree_condition %in% c("Significant","Severe","Dead or dying")){
    DAMAGED=1
  } else if(injury_type_1 %in% c("Significant","Severe","Dead or dying")) {
    DAMAGED=1
  } else if(injury_type_2 %in% c("Significant","Severe","Dead or dying")) {
    DAMAGED=1
  } else if(injury_type_3 %in% c("Significant","Severe","Dead or dying")) {
    DAMAGED=1
  } else {
    DAMAGED=0
  })

However, this throws an condition as:

1: In if (tree_condition %in% c("Significant", "Severe", "Dead or dying")) { :
  the condition has length > 1 and only the first element will be used
2: In if (injury_type_1 %in% c("Significant", "Severe", "Dead or dying")) { :
  the condition has length > 1 and only the first element will be used
3: In if (injury_type_2 %in% c("Significant", "Severe", "Dead or dying")) { :
  the condition has length > 1 and only the first element will be used
4: In if (injury_type_3 %in% c("Significant", "Severe", "Dead or dying")) { :
  the condition has length > 1 and only the first element will be used

Example data:

df <- structure(list(tree_id = c("F41030808008", "F41030808008", "F41030808008", "F41030808008", "F41031302007", "F41031302007", "F41031302007", "F41031302007", "F41031302007", "F41031302007", "F41030808008", "F41030808008"), Siteid = c("F410", "F410", "F410", "F410", "F410", "F410", "F410", "F410", "F410", "F410", "F410", "F410"), injury_type_1 = c(NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Dead or dying"), injury_type_2 = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), injury_type_3 = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), height = c(NA, NA, 122, 5, NA, 35, 185, 245, 300, 102, NA, NA)), row.names = c(NA, -12L), class = "data.frame")

The result I want:

expected_result <– structure(list(tree_id = c("F41031302007", "F41031302007", "F41031302007", "F41031302007", "F41031302007", "F41031302007"), Siteid = c("F410", "F410", "F410", "F410", "F410", "F410"), injury_type_1 = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), injury_type_2 = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), injury_type_3 = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), height = c(NA, 35, 185, 245, 300, 102)), row.names = c(NA, -6L), class = "data.frame")

Upvotes: 0

Views: 627

Answers (2)

Silviculturalist
Silviculturalist

Reputation: 199

Solved:

I adapted the following from an answer in another thread.

How to remove groups of observation with dplyr::filter()

My final code:

df %>% group_by(tree_group) %>% 
  filter(all(!(tree_condition %in% c("Significant","Severe","Dead or dying")))) %>%
  filter(all(!(injury_type_1 %in% c("Significant","Severe","Dead or dying")))) %>%
  filter(all(!(injury_type_2 %in% c("Significant","Severe","Dead or dying")))) %>%
  filter(all(!(injury_type_3 %in% c("Significant","Severe","Dead or dying"))))

This code removes all of the observations in a group with a variable with one or more values of "Significant", "Severe", "Dead or dying" from the filtered set.

Upvotes: 0

akrun
akrun

Reputation: 887991

We can use filter_at

library(dplyr)
df %>%
   group_by(tree_group) 
   filter_at(vars(tree_condition, matches('^injury_type_\\d+$')), 
          any_vars(!. %in%  c("Significant","Severe","Dead or dying")))

Upvotes: 1

Related Questions