Reputation:
I have two time series data. One including drug name, start date of usage, Stop date of usage, and dose of drug and second data including visit date and score,
data1<- data.frame("Drug Name" = c("Drug1","Drug1","Drug2","Drug1","Drug3","Drug2",
"Drug4","Drug5","Drug1"),
"Start Date" = c("7/1/2016","1/1/2016", "8/6/2015","2/1/2015","6/14/2017",
"6/21/2017","1/24/2018","7/30/2018","7/30/2018"),
"Stop Date "=c("1/14/2017","1/14/2017", "1/14/2017","1/14/2017"
,"1/24/2018","6/29/2018","6/29/2018","Ongoing","Ongoing"),
"Dose"=c(12,20,32,3,5,6,6,8,9))
data2<-data.frame("visitdate"=c("8/24/2016","8/24/2016", "10/19/2016","12/7/2016","12/21/2016",
"3/22/2017","5/10/2017", "6/14/2017", "7/12/2017","9/27/2017",
"11/29/2017", "1/24/2018","3/21/2018","5/30/2018","8/15/2018",
"10/3/2018", "11/28/2018"),
"Score"=c(1,2,3,34,6,7,9,5,6,8,5,5,7,9,1,2,5))
I would like to merge these two data in a way that for example on visit date 8/24/2016
tells me how many drugs the patient has taken with its dose plus the clinical score.
Upvotes: 0
Views: 177
Reputation: 30474
There may be some pre-processing of data to consider early on.
First, your example above has column names with spaces, would be nice to avoid. I edited and removed spaces for this example.
In addition, you have "Ongoing" as a date. Would recommend converting to dates using as.Date
. However, after converting those with "Ongoing" will be included as NA
. These can be set to Inf
(infinity) which will work.
For example:
data1$StartDate <- as.Date(data1$StartDate, format = "%m/%d/%Y")
data1$StopDate <- as.Date(data1$StopDate, format = "%m/%d/%Y")
data2$VisitDate <- as.Date(data2$VisitDate, format = "%m/%d/%Y")
data1$StopDate[8:9] <- Inf
There are a number of other ways to address this, depending on your source of data.
After that, you can use tidyverse
and fuzzyjoin
to do the following. Using fuzzy_left_join
you can join your two data frames together, where only the dates within the date range are included.
You might consider leaving the results in long form. However, if you want the wide format, you can use pivot_wider
. The final select
puts the columns in order numerically as in your example.
library(tidyverse)
library(fuzzyjoin)
fuzzy_left_join(data2,
data1,
by = c("VisitDate" = "StartDate",
"VisitDate" = "StopDate"),
match_fun = list(`>=`, `<=`)) %>%
select(-StartDate, -StopDate) %>%
group_by(VisitDate, Score) %>%
mutate(rn = row_number(),
NumDrugs = ifelse(all(is.na(DrugName)) == TRUE, 0, n())) %>%
pivot_wider(id_cols = c(VisitDate, Score, NumDrugs), names_from = rn, values_from = c(DrugName, Dose)) %>%
select(VisitDate, Score, NumDrugs, names(.)[-c(1:3)][order(parse_number(names(.)[-c(1:3)]))])
Output
VisitDate Score NumDrugs DrugName_1 Dose_1 DrugName_2 Dose_2 DrugName_3 Dose_3 DrugName_4 Dose_4
<date> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 2016-08-24 1 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
2 2016-08-24 2 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
3 2016-10-19 3 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
4 2016-12-07 34 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
5 2016-12-21 6 4 Drug1 12 Drug1 20 Drug2 32 Drug1 3
6 2017-03-22 7 0 NA NA NA NA NA NA NA NA
7 2017-05-10 9 0 NA NA NA NA NA NA NA NA
8 2017-06-14 5 1 Drug3 5 NA NA NA NA NA NA
9 2017-07-12 6 2 Drug3 5 Drug2 6 NA NA NA NA
10 2017-09-27 8 2 Drug3 5 Drug2 6 NA NA NA NA
11 2017-11-29 5 2 Drug3 5 Drug2 6 NA NA NA NA
12 2018-01-24 5 3 Drug3 5 Drug2 6 Drug4 6 NA NA
13 2018-03-21 7 2 Drug2 6 Drug4 6 NA NA NA NA
14 2018-05-30 9 2 Drug2 6 Drug4 6 NA NA NA NA
15 2018-08-15 1 2 Drug5 8 Drug1 9 NA NA NA NA
16 2018-10-03 2 2 Drug5 8 Drug1 9 NA NA NA NA
17 2018-11-28 5 2 Drug5 8 Drug1 9 NA NA NA NA
Data
(Before converting dates)
data1 <- structure(list(DrugName = c("Drug1", "Drug1", "Drug2", "Drug1",
"Drug3", "Drug2", "Drug4", "Drug5", "Drug1"), StartDate = c("7/1/2016",
"1/1/2016", "8/6/2015", "2/1/2015", "6/14/2017", "6/21/2017",
"1/24/2018", "7/30/2018", "7/30/2018"), StopDate = c("1/14/2017",
"1/14/2017", "1/14/2017", "1/14/2017", "1/24/2018", "6/29/2018",
"6/29/2018", NA, NA), Dose = c(12, 20, 32, 3, 5, 6, 6, 8, 9)), class = "data.frame", row.names = c(NA,
-9L))
data2 <- structure(list(VisitDate = c("8/24/2016", "8/24/2016", "10/19/2016",
"12/7/2016", "12/21/2016", "3/22/2017", "5/10/2017", "6/14/2017",
"7/12/2017", "9/27/2017", "11/29/2017", "1/24/2018", "3/21/2018",
"5/30/2018", "8/15/2018", "10/3/2018", "11/28/2018"), Score = c(1,
2, 3, 34, 6, 7, 9, 5, 6, 8, 5, 5, 7, 9, 1, 2, 5)), class = "data.frame", row.names = c(NA,
-17L))
Upvotes: 1