Blundering Ecologist
Blundering Ecologist

Reputation: 1315

Grouping multiple dplyr::filter() expressions R

Is there a more efficient way of coding multiple exclusion filters in R using dplyr? My problem is similar to this OP.

Simplified example of my data:

data<-data.frame(age=c(1,1,1,2,2,2,3,3,4,4,4,4,4,4),wt=c(32,12,5,32,80,32,1,0,4,8,1,1,2,50))

I go on to filter out extreme values based on numerous different conditions depending on the age variable and the mean and sd of the wt variable.

My current solution looks like this:

data_clean<-data%>%filter(
    !(age==1 & wt<6),
    !(age==1 & wt>20),
    !(age==2 & wt >= (mean((data%>%filter(age==2))$wt) +sd((data%>%filter(age==2))$wt))), 
    !(age==2 & wt <= (mean((data%>%filter(age==2))$wt)-sd((data%>%filter(age==2))$wt))),
    !(age==3 & wt >= (mean((data%>%filter(age==3))$wt) +sd((data%>%filter(age==3))$wt))), 
    !(age==3 & wt <= (mean((data%>%filter(age==3))$wt)-sd((data%>%filter(age==3))$wt))),
    !(age==4 & wt >= (mean((data%>%filter(age==4))$wt) +2*sd((data%>%filter(age==4))$wt))), 
    !(age==4 & wt <= (mean((data%>%filter(age==4))$wt)-2*sd((data%>%filter(age==4))$wt)))
        )

I have also tried:

filter(!wt %in% wt[age==2 & wt >= (mean(wt) +sd(wt))], 
       !wt %in% wt[age==2 & wt <= (mean(wt) -sd(wt))]), etc...

EDIT: Note that the filter arguments are not the same for all ages. I am applying the same filter to age==2 and age==3 and then a different filter to age==4. In all cases, the mean() and sd() are relative to the age. So, at age==2, I am filtering based on the mean() wt at age==2 and the sd() at age==2.

BUT, these are really cumbersome solutions and is not going to be feasible for me given I actually have 8 different exclusion criteria across multiple ages (1-50).

How can I group multiple arguments together? For example, the filtering is the same for age==2 and age==3. Instead of having 4 lines, is there a way to only have 2 lines?

In this example, the final dataset should look like this:

     age  wt
      1  12
      2  32
      2  32
      3   1
      3   0
      4   4
      4   8
      4   1
      4   1
      4   2

Upvotes: 0

Views: 205

Answers (1)

lroha
lroha

Reputation: 34501

You can simplify things by standardizing the wt variable by age:

library(dplyr)

data %>%
  group_by(age) %>%
  mutate(x = abs(scale(wt)[,1])) %>%
  filter(age %in% 2:3 & x <= 1 | age == 4 & x <= 2 | age == 1 & wt > 6 & age == 1 & wt < 20) %>%
  select(-x) %>%
  ungroup()

# A tibble: 10 x 2
     age    wt
   <dbl> <dbl>
 1     1    12
 2     2    32
 3     2    32
 4     3     1
 5     3     0
 6     4     4
 7     4     8
 8     4     1
 9     4     1
10     4     2

Upvotes: 2

Related Questions