Match data within multiple time-frames with dplyr

Question

I have a dataframe df with a variable gfr and the corresponding gfr_date. I have a second dataframe drug which indicate when a drug was started and stopped. The drug can be started and stopped multiple times as in id 1.

I want to "join" the two dataframes (wanted) to get a new variable (drug1) which is 1 if the id used a certain drug at gfr_date and 0 if not.

Some sample data df and drug:

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 
                            2, 2, 3, 3, 3, 3), 
                     gfr = c(90, 109, 84, 81, 92, 76, 40, 64, 41, 
                             64, 28, 82, 61, 53, 47, 31, 29, 31, 31), 
                     gfr_date = structure(c(16812, 16996, 17178, 
                                            17372, 17542, 17731, 
                                            16686, 16741, 16898, 
                                            16909, 17098, 17120, 
                                            17295, 17442, 17668, 
                                            17283, 17435, 17568, 
                                            17758), class = "Date")), 
                row.names = c(NA, -19L), class = c("tbl_df", "tbl", "data.frame"))

df
#> # A tibble: 19 x 3
#>       id   gfr gfr_date  
#>          
#>  1     1    90 2016-01-12
#>  2     1   109 2016-07-14
#>  3     1    84 2017-01-12
#>  4     1    81 2017-07-25
#>  5     1    92 2018-01-11
#>  6     1    76 2018-07-19
#>  7     2    40 2015-09-08
#>  8     2    64 2015-11-02
#>  9     2    41 2016-04-07
#> 10     2    64 2016-04-18
#> 11     2    28 2016-10-24
#> 12     2    82 2016-11-15
#> 13     2    61 2017-05-09
#> 14     2    53 2017-10-03
#> 15     2    47 2018-05-17
#> 16     3    31 2017-04-27
#> 17     3    29 2017-09-26
#> 18     3    31 2018-02-06
#> 19     3    31 2018-08-15

drug <- structure(list(id = c(1, 1, 2, 3), 
               drug = c("drug1", "drug1", "drug1", "drug1"), 
               drstartd = structure(c(16261, 17008, 17443, 16252), class = "Date"), 
               drstopd = structure(c(16417, NA, NA, 17412), class = "Date")), 
          class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L))

drug
#> # A tibble: 4 x 4
#>      id drug  drstartd   drstopd   
#>              
#> 1     1 drug1 2014-07-10 2014-12-13
#> 2     1 drug1 2016-07-26 NA        
#> 3     2 drug1 2017-10-04 NA        
#> 4     3 drug1 2014-07-01 2017-09-03

wanted <- df %>% 
  mutate(drug1 = c(0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0))

wanted
#> # A tibble: 19 x 4
#>       id   gfr gfr_date   drug1
#>           
#>  1     1    90 2016-01-12     0
#>  2     1   109 2016-07-14     0
#>  3     1    84 2017-01-12     1
#>  4     1    81 2017-07-25     1
#>  5     1    92 2018-01-11     1
#>  6     1    76 2018-07-19     1
#>  7     2    40 2015-09-08     0
#>  8     2    64 2015-11-02     0
#>  9     2    41 2016-04-07     0
#> 10     2    64 2016-04-18     0
#> 11     2    28 2016-10-24     0
#> 12     2    82 2016-11-15     0
#> 13     2    61 2017-05-09     0
#> 14     2    53 2017-10-03     0
#> 15     2    47 2018-05-17     1
#> 16     3    31 2017-04-27     1
#> 17     3    29 2017-09-26     0
#> 18     3    31 2018-02-06     0
#> 19     3    31 2018-08-15     0

^{Created on 2019-08-19 by the reprex package (v0.3.0)}

I hope the wanted dataframe is correct as I did it by hand...Thanks for your help!

EDIT Removed the STATA attributes from the reprex.

Andreas · Accepted Answer

I believe this should do what you're after. Note that I made the id column of 'drug' into a regular numeric (I got rid of 'label = "ID", format.stata = "%12.0g"') so that the join on 'id' would work

df %>%
  left_join(drug, by = 'id') %>%
  mutate(drug1 = gfr_date >= drstartd & (gfr_date <= drstopd | is.na(drstopd))) %>%
  group_by(id, gfr, gfr_date) %>%
  summarize(drug1 = as.numeric(any(drug1))) %>%
  arrange(id, gfr_date)

Match data within multiple time-frames with dplyr

Answers (1)

Related Questions