Reputation: 1315
Is there a more efficient way of coding multiple exclusion filters in R using dplyr
? My problem is similar to this OP.
Simplified example of my data:
data<-data.frame(age=c(1,1,1,2,2,2,3,3,4,4,4,4,4,4),wt=c(32,12,5,32,80,32,1,0,4,8,1,1,2,50))
I go on to filter out extreme values based on numerous different conditions depending on the age
variable and the mean
and sd
of the wt
variable.
My current solution looks like this:
data_clean<-data%>%filter(
!(age==1 & wt<6),
!(age==1 & wt>20),
!(age==2 & wt >= (mean((data%>%filter(age==2))$wt) +sd((data%>%filter(age==2))$wt))),
!(age==2 & wt <= (mean((data%>%filter(age==2))$wt)-sd((data%>%filter(age==2))$wt))),
!(age==3 & wt >= (mean((data%>%filter(age==3))$wt) +sd((data%>%filter(age==3))$wt))),
!(age==3 & wt <= (mean((data%>%filter(age==3))$wt)-sd((data%>%filter(age==3))$wt))),
!(age==4 & wt >= (mean((data%>%filter(age==4))$wt) +2*sd((data%>%filter(age==4))$wt))),
!(age==4 & wt <= (mean((data%>%filter(age==4))$wt)-2*sd((data%>%filter(age==4))$wt)))
)
I have also tried:
filter(!wt %in% wt[age==2 & wt >= (mean(wt) +sd(wt))],
!wt %in% wt[age==2 & wt <= (mean(wt) -sd(wt))]), etc...
EDIT: Note that the filter arguments are not the same for all ages. I am applying the same filter to age==2
and age==3
and then a different filter to age==4
. In all cases, the mean()
and sd()
are relative to the age
. So, at age==2
, I am filtering based on the mean()
wt at age==2
and the sd()
at age==2
.
BUT, these are really cumbersome solutions and is not going to be feasible for me given I actually have 8 different exclusion criteria across multiple ages (1-50).
How can I group multiple arguments together? For example, the filtering is the same for age==2
and age==3
. Instead of having 4 lines, is there a way to only have 2 lines?
In this example, the final dataset should look like this:
age wt
1 12
2 32
2 32
3 1
3 0
4 4
4 8
4 1
4 1
4 2
Upvotes: 0
Views: 205
Reputation: 34501
You can simplify things by standardizing the wt variable by age:
library(dplyr)
data %>%
group_by(age) %>%
mutate(x = abs(scale(wt)[,1])) %>%
filter(age %in% 2:3 & x <= 1 | age == 4 & x <= 2 | age == 1 & wt > 6 & age == 1 & wt < 20) %>%
select(-x) %>%
ungroup()
# A tibble: 10 x 2
age wt
<dbl> <dbl>
1 1 12
2 2 32
3 2 32
4 3 1
5 3 0
6 4 4
7 4 8
8 4 1
9 4 1
10 4 2
Upvotes: 2