sutsabs
sutsabs

Reputation: 427

how to filter groups matching more than one values?

 id drug_name     med_start  med_end   
 <dbl> <chr>         <date>     <date>    
   1 pembrolizumab 2018-02-07 2018-02-07
   1 pembrolizumab 2018-02-28 2018-02-28
   2 pembrolizumab 2018-01-05 2018-01-05
   2 nivolumab     2018-09-20 2018-09-20
   2 nivolumab     2018-10-03 2018-10-03
   2 nivolumab     2018-11-01 2018-11-01
  1. I am trying to get ids who have both pembrolizumab and nivolumab in drug_name. Can I do a group_by over id? And then filter with both conditions? For above table, id 2 has both drug_names. I might have situation where I will be filtering more than 2 drug_names.

  2. I am also trying to find to see if the gap between two med_start is greater than x days. Let's say 30 days. Basically filter ids who have gap of 30 days between med_start.

Here is the code for above data

data  <- structure(list(id = structure(c(1, 1, 2, 2, 2, 2), class = "int"), 
    drug_name = c("pembrolizumab", "pembrolizumab", "pembrolizumab", 
    "nivolumab", "nivolumab", "nivolumab"), med_start = structure(c(17569, 
    17590, 17536, 17794, 17807, 17836), class = "Date"), med_end = structure(c(17569, 
    17590, 17536, 17794, 17807, 17836), class = "Date")), row.names = c(NA, 
-6L), groups = structure(list(patient_id = structure(c(1.49283861796358e-314, 
1.6423825257779e-313), class = "integer64"), .rows = structure(list(
    1:2, 3:6), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

Upvotes: 2

Views: 63

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 102529

Here are some base R options

  1. for the first question
> unique(
+   subset(
+     data,
+     ave(match(drug_name, c("pembrolizumab", "nivolumab")), id, FUN = var) > 0,
+     select = id
+   )
+ )
# A tibble: 1 x 1
  id
  <int>
1 2
  1. for the second question
> subset(
+   data,
+   ave(as.integer(med_start), id, FUN = function(x) max(diff(x))) <= 30
+ )
# A tibble: 2 x 4
  id    drug_name     med_start  med_end
  <int> <chr>         <date>     <date>
1 1     pembrolizumab 2018-02-07 2018-02-07
2 1     pembrolizumab 2018-02-28 2018-02-28

Upvotes: -1

akrun
akrun

Reputation: 887711

We group by 'id', and filter where all the drugs of interest are %in% the 'drug_name' column, and extract the unique 'id'

library(dplyr)
data %>%
    group_by(id) %>%
    filter(all(c("pembrolizumab", "nivolumab") %in% drug_name)) %>% 
    ungroup %>%
    pull(id)%>% 
   unique

-output

[1] 2

Upvotes: 2

Related Questions