Reputation: 562
I have a dataframe df
with a variable gfr
and the corresponding gfr_date
. I have a second dataframe drug
which indicate when a drug was started and stopped. The drug can be started and stopped multiple times as in id 1.
I want to "join" the two dataframes (wanted
) to get a new variable (drug1
) which is 1 if the id used a certain drug at gfr_date
and 0 if not.
Some sample data df
and drug
:
library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2,
2, 2, 3, 3, 3, 3),
gfr = c(90, 109, 84, 81, 92, 76, 40, 64, 41,
64, 28, 82, 61, 53, 47, 31, 29, 31, 31),
gfr_date = structure(c(16812, 16996, 17178,
17372, 17542, 17731,
16686, 16741, 16898,
16909, 17098, 17120,
17295, 17442, 17668,
17283, 17435, 17568,
17758), class = "Date")),
row.names = c(NA, -19L), class = c("tbl_df", "tbl", "data.frame"))
df
#> # A tibble: 19 x 3
#> id gfr gfr_date
#> <dbl> <dbl> <date>
#> 1 1 90 2016-01-12
#> 2 1 109 2016-07-14
#> 3 1 84 2017-01-12
#> 4 1 81 2017-07-25
#> 5 1 92 2018-01-11
#> 6 1 76 2018-07-19
#> 7 2 40 2015-09-08
#> 8 2 64 2015-11-02
#> 9 2 41 2016-04-07
#> 10 2 64 2016-04-18
#> 11 2 28 2016-10-24
#> 12 2 82 2016-11-15
#> 13 2 61 2017-05-09
#> 14 2 53 2017-10-03
#> 15 2 47 2018-05-17
#> 16 3 31 2017-04-27
#> 17 3 29 2017-09-26
#> 18 3 31 2018-02-06
#> 19 3 31 2018-08-15
drug <- structure(list(id = c(1, 1, 2, 3),
drug = c("drug1", "drug1", "drug1", "drug1"),
drstartd = structure(c(16261, 17008, 17443, 16252), class = "Date"),
drstopd = structure(c(16417, NA, NA, 17412), class = "Date")),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L))
drug
#> # A tibble: 4 x 4
#> id drug drstartd drstopd
#> <dbl> <chr> <date> <date>
#> 1 1 drug1 2014-07-10 2014-12-13
#> 2 1 drug1 2016-07-26 NA
#> 3 2 drug1 2017-10-04 NA
#> 4 3 drug1 2014-07-01 2017-09-03
wanted <- df %>%
mutate(drug1 = c(0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0))
wanted
#> # A tibble: 19 x 4
#> id gfr gfr_date drug1
#> <dbl> <dbl> <date> <dbl>
#> 1 1 90 2016-01-12 0
#> 2 1 109 2016-07-14 0
#> 3 1 84 2017-01-12 1
#> 4 1 81 2017-07-25 1
#> 5 1 92 2018-01-11 1
#> 6 1 76 2018-07-19 1
#> 7 2 40 2015-09-08 0
#> 8 2 64 2015-11-02 0
#> 9 2 41 2016-04-07 0
#> 10 2 64 2016-04-18 0
#> 11 2 28 2016-10-24 0
#> 12 2 82 2016-11-15 0
#> 13 2 61 2017-05-09 0
#> 14 2 53 2017-10-03 0
#> 15 2 47 2018-05-17 1
#> 16 3 31 2017-04-27 1
#> 17 3 29 2017-09-26 0
#> 18 3 31 2018-02-06 0
#> 19 3 31 2018-08-15 0
Created on 2019-08-19 by the reprex package (v0.3.0)
I hope the wanted dataframe is correct as I did it by hand...Thanks for your help!
EDIT Removed the STATA attributes from the reprex.
Upvotes: 0
Views: 77
Reputation: 344
I believe this should do what you're after. Note that I made the id column of 'drug' into a regular numeric (I got rid of 'label = "ID", format.stata = "%12.0g"') so that the join on 'id' would work
df %>%
left_join(drug, by = 'id') %>%
mutate(drug1 = gfr_date >= drstartd & (gfr_date <= drstopd | is.na(drstopd))) %>%
group_by(id, gfr, gfr_date) %>%
summarize(drug1 = as.numeric(any(drug1))) %>%
arrange(id, gfr_date)
Upvotes: 1