Reputation: 277
I would like to merge df1 (which contains treatment intervals for patients) with df2 (which contains lab values) by patient ID and by dates such that the date of the lab value is within 5 days from the start date of a medication. See below
For df1:
ID = c(2, 2, 2, 2, 3, 5)
Medication = c("aspirin", "aspirin", "aspirin", "tylenol", "lipitor", "advil")
Start.Date = c("05/01/2017", "05/05/2017", "06/20/2017", "05/01/2017", "05/06/2017", "05/28/2017")
Stop.Date = c("05/04/2017", "05/10/2017", "06/27/2017", "05/15/2017", "05/12/2017", "06/13/2017")
df1 = data.frame(ID, Medication, Start.Date, Stop.Date)
ID Medication Start.Date Stop.Date
2 aspirin 05/01/2017 05/30/2017
2 tylenol 05/01/2017 05/15/2017
3 lipitor 05/06/2017 05/18/2017
5 advil 05/28/2017 06/13/2017
For df2:
ID = c(2,2,2,3,3,5)
Lab.date = c("04/30/2017", "05/03/2017", "05/15/2017", "05/05/2017", "05/18/17", "05/15/2017")
Lab.wbc = c(5.4, 3.2, 7.1, 6.0, 10.8, 11.3)
df2 = data.frame(ID, Lab.date, Lab.wbc)
ID Lab.date Lab.wbc
2 04/30/2017 5.4
2 05/03/2017 3.2
2 05/15/2017 7.1
3 05/05/2017 6.0
3 05/18/2017 10.8
5 05/15/2017 11.3
The merge should lead to this below where the Lab.date is + or - 5 days from medication start date:
ID Medication Start.Date Stop.Date Lab.date Lab.wbc
2 aspirin 05/01/2017 05/30/2017 04/30/2017 5.4
2 aspirin 05/01/2017 05/30/2017 05/03/2017 3.2
2 tylenol 05/01/2017 05/15/2017 04/30/2017 5.4
2 tylenol 05/01/2017 05/15/2017 05/03/2017 3.2
3 lipitor 05/06/2017 05/18/2017 05/05/2017 6.0
Upvotes: 0
Views: 568
Reputation: 2864
Below is a possible solution. Please see that there are other potential outcomes in the final data frame which you didn't account for at the end of your question.
library(dplyr)
# reproducing your setup
ID = c(2, 2, 2, 2, 3, 5)
Medication = c("aspirin", "aspirin", "aspirin", "tylenol", "lipitor", "advil")
Start.Date = c("05/01/2017", "05/05/2017", "06/20/2017", "05/01/2017", "05/06/2017", "05/28/2017")
Stop.Date = c("05/04/2017", "05/10/2017", "06/27/2017", "05/15/2017", "05/12/2017", "06/13/2017")
df1 = data.frame(ID, Medication, Start.Date, Stop.Date)
ID = c(2,2,2,3,3,5)
Lab.date = c("04/30/2017", "05/03/2017", "05/15/2017", "05/05/2017", "05/18/17", "05/15/2017")
Lab.wbc = c(5.4, 3.2, 7.1, 6.0, 10.8, 11.3)
df2 = data.frame(ID, Lab.date, Lab.wbc)
# having a full join by patient ID
full_df <- full_join(df1, df2, by = "ID")
# note that accurate result should include more rows compared to the one given in the question
result <- full_df %>%
# including the day difference for your reference
mutate(Day.diff = abs(as.Date(Start.Date, "%m/%d/%Y") - as.Date(Lab.date, "%m/%d/%Y"))) %>%
# filtering the data frame to keep the difference within 5 days
filter(Day.diff <= 5)
Upvotes: 1