gfa2001
gfa2001

Reputation: 277

Merge dataframes by date columns within range from one another

I would like to merge df1 (which contains treatment intervals for patients) with df2 (which contains lab values) by patient ID and by dates such that the date of the lab value is within 5 days from the start date of a medication. See below

For df1:

ID = c(2, 2, 2, 2, 3, 5) 
Medication = c("aspirin", "aspirin", "aspirin", "tylenol", "lipitor", "advil") 
Start.Date = c("05/01/2017", "05/05/2017", "06/20/2017", "05/01/2017", "05/06/2017", "05/28/2017")
Stop.Date = c("05/04/2017", "05/10/2017", "06/27/2017", "05/15/2017", "05/12/2017", "06/13/2017")
df1 = data.frame(ID, Medication, Start.Date, Stop.Date) 

  ID Medication Start.Date  Stop.Date
   2    aspirin 05/01/2017 05/30/2017
   2    tylenol 05/01/2017 05/15/2017
   3    lipitor 05/06/2017 05/18/2017
   5      advil 05/28/2017 06/13/2017

For df2:

ID = c(2,2,2,3,3,5)
Lab.date = c("04/30/2017", "05/03/2017", "05/15/2017", "05/05/2017", "05/18/17", "05/15/2017")
Lab.wbc = c(5.4, 3.2, 7.1, 6.0, 10.8, 11.3)
df2 = data.frame(ID, Lab.date, Lab.wbc)

  ID   Lab.date Lab.wbc
   2 04/30/2017     5.4
   2 05/03/2017     3.2
   2 05/15/2017     7.1
   3 05/05/2017     6.0
   3 05/18/2017    10.8
   5 05/15/2017    11.3

The merge should lead to this below where the Lab.date is + or - 5 days from medication start date:

   ID Medication Start.Date Stop.Date  Lab.date   Lab.wbc
   2    aspirin  05/01/2017 05/30/2017 04/30/2017 5.4
   2    aspirin  05/01/2017 05/30/2017 05/03/2017 3.2
   2    tylenol  05/01/2017 05/15/2017 04/30/2017 5.4
   2    tylenol  05/01/2017 05/15/2017 05/03/2017 3.2
   3    lipitor  05/06/2017 05/18/2017 05/05/2017 6.0

Upvotes: 0

Views: 568

Answers (1)

ozanstats
ozanstats

Reputation: 2864

Below is a possible solution. Please see that there are other potential outcomes in the final data frame which you didn't account for at the end of your question.

library(dplyr)

# reproducing your setup
ID = c(2, 2, 2, 2, 3, 5) 
Medication = c("aspirin", "aspirin", "aspirin", "tylenol", "lipitor", "advil") 
Start.Date = c("05/01/2017", "05/05/2017", "06/20/2017", "05/01/2017", "05/06/2017", "05/28/2017")
Stop.Date = c("05/04/2017", "05/10/2017", "06/27/2017", "05/15/2017", "05/12/2017", "06/13/2017")
df1 = data.frame(ID, Medication, Start.Date, Stop.Date) 

ID = c(2,2,2,3,3,5)
Lab.date = c("04/30/2017", "05/03/2017", "05/15/2017", "05/05/2017", "05/18/17", "05/15/2017")
Lab.wbc = c(5.4, 3.2, 7.1, 6.0, 10.8, 11.3)
df2 = data.frame(ID, Lab.date, Lab.wbc)

# having a full join by patient ID
full_df <- full_join(df1, df2, by = "ID")

# note that accurate result should include more rows compared to the one given in the question
result <- full_df %>%
  # including the day difference for your reference
  mutate(Day.diff = abs(as.Date(Start.Date, "%m/%d/%Y") - as.Date(Lab.date, "%m/%d/%Y"))) %>%
  # filtering the data frame to keep the difference within 5 days
  filter(Day.diff <= 5)

Upvotes: 1

Related Questions