EML
EML

Reputation: 671

Filtering rows based on two conditions at the ID level

I have long data where a given subject has 4 observations. I want to only include a given id that meets the following conditions:

  1. has at least one 3

  2. has at least one of 1,2 OR NA

My data structure:

df <- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3), a=c(NA,1,2,3, NA,3,2,0, NA,NA,1,1))

My unsuccessful attempt (I get an empty data frame):

df %>% dplyr::group_by(id) %>% filter(a==3 & a %in% c(1,2,NA)) 

Upvotes: 1

Views: 967

Answers (3)

AHart
AHart

Reputation: 448

As I read it, the filter should keep ids 1 and 2. So I would use combo of all/any:

    df  %>% 
      group_by(id) %>% 
      filter(all(3 %in% a) & any(c(1,2,NA) %in% a))

Upvotes: 0

akrun
akrun

Reputation: 887118

An option is to group by 'id', create a logic to return single TRUE/FALSE as output. Based on the OP's post, we need both values '3' and either one of the values 1, 2, NA in the column 'a'. So, 3 %in% a returns a logical vector of length 1, then wrap any on the second set where we do a comparison with multiple values or check the NA elements (is.na), merge both logical output with &

library(dplyr)
df %>% 
  group_by(id) %>%
  filter((3 %in% a) & any(c(1, 2) %in% a|is.na(a)) )
# A tibble: 8 x 2
# Groups:   id [2]
#     id     a
#  <dbl> <dbl>
#1     1    NA
#2     1     1
#3     1     2
#4     1     3
#5     2    NA
#6     2     3
#7     2     2
#8     2     0

Upvotes: 4

user10917479
user10917479

Reputation:

I have done this a bit of a long way to show how an idea could work. You can consolidate this a bit.

df %>%
  group_by(id) %>%
  mutate(has_3 = sum(a == 3, na.rm = T) > 0,
         keep_me = has_3 & (sum(is.na(a)) > 0 | sum(a %in% c(1, 2)) > 0)) %>%
  filter(keep_me == TRUE) %>%
  select(id, a)

     id     a
  <dbl> <dbl>
1     1    NA
2     1     1
3     1     2
4     1     3
5     2    NA
6     2     3
7     2     2
8     2     0

Upvotes: 2

Related Questions