Reputation: 671
I have long data where a given subject has 4 observations. I want to only include a given id that meets the following conditions:
has at least one 3
has at least one of 1,2 OR NA
My data structure:
df <- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3), a=c(NA,1,2,3, NA,3,2,0, NA,NA,1,1))
My unsuccessful attempt (I get an empty data frame):
df %>% dplyr::group_by(id) %>% filter(a==3 & a %in% c(1,2,NA))
Upvotes: 1
Views: 967
Reputation: 448
As I read it, the filter should keep ids 1 and 2. So I would use combo of all/any:
df %>%
group_by(id) %>%
filter(all(3 %in% a) & any(c(1,2,NA) %in% a))
Upvotes: 0
Reputation: 887118
An option is to group by 'id', create a logic to return single TRUE/FALSE as output. Based on the OP's post, we need both values '3' and either one of the values 1, 2, NA in the column 'a'. So, 3 %in% a
returns a logical vector of length 1, then wrap any
on the second set where we do a comparison with multiple values or check the NA elements (is.na
), merge both logical output with &
library(dplyr)
df %>%
group_by(id) %>%
filter((3 %in% a) & any(c(1, 2) %in% a|is.na(a)) )
# A tibble: 8 x 2
# Groups: id [2]
# id a
# <dbl> <dbl>
#1 1 NA
#2 1 1
#3 1 2
#4 1 3
#5 2 NA
#6 2 3
#7 2 2
#8 2 0
Upvotes: 4
Reputation:
I have done this a bit of a long way to show how an idea could work. You can consolidate this a bit.
df %>%
group_by(id) %>%
mutate(has_3 = sum(a == 3, na.rm = T) > 0,
keep_me = has_3 & (sum(is.na(a)) > 0 | sum(a %in% c(1, 2)) > 0)) %>%
filter(keep_me == TRUE) %>%
select(id, a)
id a
<dbl> <dbl>
1 1 NA
2 1 1
3 1 2
4 1 3
5 2 NA
6 2 3
7 2 2
8 2 0
Upvotes: 2