Reputation: 427
id drug_name med_start med_end
<dbl> <chr> <date> <date>
1 pembrolizumab 2018-02-07 2018-02-07
1 pembrolizumab 2018-02-28 2018-02-28
2 pembrolizumab 2018-01-05 2018-01-05
2 nivolumab 2018-09-20 2018-09-20
2 nivolumab 2018-10-03 2018-10-03
2 nivolumab 2018-11-01 2018-11-01
I am trying to get ids who have both pembrolizumab
and nivolumab
in drug_name
. Can I do a group_by
over id
? And then filter with both conditions?
For above table, id 2 has both drug_names
. I might have situation where I will be filtering more than 2 drug_names
.
I am also trying to find to see if the gap between two med_start
is greater than x days. Let's say 30 days. Basically filter ids who have gap of 30 days between med_start
.
Here is the code for above data
data <- structure(list(id = structure(c(1, 1, 2, 2, 2, 2), class = "int"),
drug_name = c("pembrolizumab", "pembrolizumab", "pembrolizumab",
"nivolumab", "nivolumab", "nivolumab"), med_start = structure(c(17569,
17590, 17536, 17794, 17807, 17836), class = "Date"), med_end = structure(c(17569,
17590, 17536, 17794, 17807, 17836), class = "Date")), row.names = c(NA,
-6L), groups = structure(list(patient_id = structure(c(1.49283861796358e-314,
1.6423825257779e-313), class = "integer64"), .rows = structure(list(
1:2, 3:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Upvotes: 2
Views: 63
Reputation: 102529
Here are some base R options
> unique(
+ subset(
+ data,
+ ave(match(drug_name, c("pembrolizumab", "nivolumab")), id, FUN = var) > 0,
+ select = id
+ )
+ )
# A tibble: 1 x 1
id
<int>
1 2
> subset(
+ data,
+ ave(as.integer(med_start), id, FUN = function(x) max(diff(x))) <= 30
+ )
# A tibble: 2 x 4
id drug_name med_start med_end
<int> <chr> <date> <date>
1 1 pembrolizumab 2018-02-07 2018-02-07
2 1 pembrolizumab 2018-02-28 2018-02-28
Upvotes: -1
Reputation: 887711
We group by 'id', and filter
where all
the drugs of interest are %in%
the 'drug_name' column, and extract the unique
'id'
library(dplyr)
data %>%
group_by(id) %>%
filter(all(c("pembrolizumab", "nivolumab") %in% drug_name)) %>%
ungroup %>%
pull(id)%>%
unique
-output
[1] 2
Upvotes: 2