user12035904
user12035904

Reputation:

How to merge two time series data in R?

I have two time series data. One including drug name, start date of usage, Stop date of usage, and dose of drug and second data including visit date and score,

data1<- data.frame("Drug Name" = c("Drug1","Drug1","Drug2","Drug1","Drug3","Drug2",
                                   "Drug4","Drug5","Drug1"), 
                   "Start Date" = c("7/1/2016","1/1/2016", "8/6/2015","2/1/2015","6/14/2017",
                                    "6/21/2017","1/24/2018","7/30/2018","7/30/2018"),
                   "Stop Date "=c("1/14/2017","1/14/2017", "1/14/2017","1/14/2017"
                                  ,"1/24/2018","6/29/2018","6/29/2018","Ongoing","Ongoing"),
                   "Dose"=c(12,20,32,3,5,6,6,8,9))
data2<-data.frame("visitdate"=c("8/24/2016","8/24/2016", "10/19/2016","12/7/2016","12/21/2016",
                                "3/22/2017","5/10/2017", "6/14/2017", "7/12/2017","9/27/2017", 
                                "11/29/2017", "1/24/2018","3/21/2018","5/30/2018","8/15/2018",
                                "10/3/2018", "11/28/2018"),
                  "Score"=c(1,2,3,34,6,7,9,5,6,8,5,5,7,9,1,2,5))

I would like to merge these two data in a way that for example on visit date 8/24/2016 tells me how many drugs the patient has taken with its dose plus the clinical score.

Upvotes: 0

Views: 177

Answers (1)

Ben
Ben

Reputation: 30474

There may be some pre-processing of data to consider early on.

First, your example above has column names with spaces, would be nice to avoid. I edited and removed spaces for this example.

In addition, you have "Ongoing" as a date. Would recommend converting to dates using as.Date. However, after converting those with "Ongoing" will be included as NA. These can be set to Inf (infinity) which will work.

For example:

data1$StartDate <- as.Date(data1$StartDate, format = "%m/%d/%Y")
data1$StopDate <- as.Date(data1$StopDate, format = "%m/%d/%Y")
data2$VisitDate <- as.Date(data2$VisitDate, format = "%m/%d/%Y")

data1$StopDate[8:9] <- Inf

There are a number of other ways to address this, depending on your source of data.

After that, you can use tidyverse and fuzzyjoin to do the following. Using fuzzy_left_join you can join your two data frames together, where only the dates within the date range are included.

You might consider leaving the results in long form. However, if you want the wide format, you can use pivot_wider. The final select puts the columns in order numerically as in your example.

library(tidyverse)
library(fuzzyjoin)

fuzzy_left_join(data2,
                data1,
                by = c("VisitDate" = "StartDate",
                       "VisitDate" = "StopDate"),
                match_fun = list(`>=`, `<=`)) %>%
  select(-StartDate, -StopDate) %>%
  group_by(VisitDate, Score) %>%
  mutate(rn = row_number(),
         NumDrugs = ifelse(all(is.na(DrugName)) == TRUE, 0, n())) %>%
  pivot_wider(id_cols = c(VisitDate, Score, NumDrugs), names_from = rn, values_from = c(DrugName, Dose)) %>%
  select(VisitDate, Score, NumDrugs, names(.)[-c(1:3)][order(parse_number(names(.)[-c(1:3)]))])

Output

   VisitDate  Score NumDrugs DrugName_1 Dose_1 DrugName_2 Dose_2 DrugName_3 Dose_3 DrugName_4 Dose_4
   <date>     <dbl>    <dbl> <chr>       <dbl> <chr>       <dbl> <chr>       <dbl> <chr>       <dbl>
 1 2016-08-24     1        4 Drug1          12 Drug1          20 Drug2          32 Drug1           3
 2 2016-08-24     2        4 Drug1          12 Drug1          20 Drug2          32 Drug1           3
 3 2016-10-19     3        4 Drug1          12 Drug1          20 Drug2          32 Drug1           3
 4 2016-12-07    34        4 Drug1          12 Drug1          20 Drug2          32 Drug1           3
 5 2016-12-21     6        4 Drug1          12 Drug1          20 Drug2          32 Drug1           3
 6 2017-03-22     7        0 NA             NA NA             NA NA             NA NA             NA
 7 2017-05-10     9        0 NA             NA NA             NA NA             NA NA             NA
 8 2017-06-14     5        1 Drug3           5 NA             NA NA             NA NA             NA
 9 2017-07-12     6        2 Drug3           5 Drug2           6 NA             NA NA             NA
10 2017-09-27     8        2 Drug3           5 Drug2           6 NA             NA NA             NA
11 2017-11-29     5        2 Drug3           5 Drug2           6 NA             NA NA             NA
12 2018-01-24     5        3 Drug3           5 Drug2           6 Drug4           6 NA             NA
13 2018-03-21     7        2 Drug2           6 Drug4           6 NA             NA NA             NA
14 2018-05-30     9        2 Drug2           6 Drug4           6 NA             NA NA             NA
15 2018-08-15     1        2 Drug5           8 Drug1           9 NA             NA NA             NA
16 2018-10-03     2        2 Drug5           8 Drug1           9 NA             NA NA             NA
17 2018-11-28     5        2 Drug5           8 Drug1           9 NA             NA NA             NA

Data

(Before converting dates)

data1 <- structure(list(DrugName = c("Drug1", "Drug1", "Drug2", "Drug1", 
"Drug3", "Drug2", "Drug4", "Drug5", "Drug1"), StartDate = c("7/1/2016", 
"1/1/2016", "8/6/2015", "2/1/2015", "6/14/2017", "6/21/2017", 
"1/24/2018", "7/30/2018", "7/30/2018"), StopDate = c("1/14/2017", 
"1/14/2017", "1/14/2017", "1/14/2017", "1/24/2018", "6/29/2018", 
"6/29/2018", NA, NA), Dose = c(12, 20, 32, 3, 5, 6, 6, 8, 9)), class = "data.frame", row.names = c(NA, 
-9L))

data2 <- structure(list(VisitDate = c("8/24/2016", "8/24/2016", "10/19/2016", 
"12/7/2016", "12/21/2016", "3/22/2017", "5/10/2017", "6/14/2017", 
"7/12/2017", "9/27/2017", "11/29/2017", "1/24/2018", "3/21/2018", 
"5/30/2018", "8/15/2018", "10/3/2018", "11/28/2018"), Score = c(1, 
2, 3, 34, 6, 7, 9, 5, 6, 8, 5, 5, 7, 9, 1, 2, 5)), class = "data.frame", row.names = c(NA, 
-17L))

Upvotes: 1

Related Questions