Reputation: 976
I want to filter my dataset to keep cases with observations in a specific column. To illustrate:
help <- data.frame(deid = c(5, 5, 5, 5, 5, 12, 12, 12, 12, 17, 17, 17),
score.a = c(NA, 1, 1, 1, NA, NA, NA, NA, NA, NA, 1, NA))
Creates
deid score.a
1 5 NA
2 5 1
3 5 1
4 5 1
5 5 NA
6 12 NA
7 12 NA
8 12 NA
9 12 NA
10 17 NA
11 17 1
12 17 NA
And I want to tell dplyr to keep cases that have any observations in score.a
, including the NA values. Thus, I want it to return:
deid score.a
1 5 NA
2 5 1
3 5 1
4 5 1
5 5 NA
6 17 NA
7 17 1
8 17 NA
I ran the code help %>% group_by(deid) %>% filter(score.a > 0)
however it pulls out the NAs as well. Thank you for any assistance.
Edit: A similar question was asked here How to remove groups of observation with dplyr::filter() However, in the answer they use the 'all' condition and this requires use of the 'any' condition.
Upvotes: 2
Views: 1039
Reputation: 4554
library(dplyr)
df%>%group_by(deid)%>%filter(sum(score.a,na.rm=T)>0)
Upvotes: 2
Reputation: 887078
Try
library(dplyr)
help %>%
group_by(deid) %>%
filter(any(score.a >0 & !is.na(score.a)))
# deid score.a
#1 5 NA
#2 5 1
#3 5 1
#4 5 1
#5 5 NA
#6 17 NA
#7 17 1
#8 17 NA
Or a similar approach with data.table
library(data.table)
setDT(help)[, if(any(score.a>0 & !is.na(score.a))) .SD , deid]
# deid score.a
#1: 5 NA
#2: 5 1
#3: 5 1
#4: 5 1
#5: 5 NA
#6: 17 NA
#7: 17 1
#8: 17 NA
If the condition is to subset 'deid's with all the values in 'score.a' > 0, then the above code can be modified to,
setDT(help)[, if(!all(is.na(score.a)) &
all(score.a[!is.na(score.a)]>0)) .SD , deid]
# deid score.a
#1: 5 NA
#2: 5 1
#3: 5 1
#4: 5 1
#5: 5 NA
#6: 17 NA
#7: 17 1
#8: 17 NA
Suppose one of the 'score.a' in 'deid' group is less than 0,
help$score.a[3] <- -1
the above code would return
setDT(help)[, if(!all(is.na(score.a)) &
all(score.a[!is.na(score.a)]>0, deid],
# deid score.a
#1: 17 NA
#2: 17 1
#3: 17 NA
Upvotes: 5