Mayur
Mayur

Reputation: 93

How to extract date based on condition over two different variables in R

I have a dataset of 100 observations which contain patient id,drugcode,prescription date. I want to create a new column "index date" which is the date when the patient changed drug for the third time.

PatientID   DrugCode    Prescriptiondate   
A1  3   07-08-2014   
A1  3   08-09-2014   
A1  7   19-09-2014   
A1  5   30-09-2014  
A2  4   11-07-2014  
A2  4   21-07-2014  
A2  3   13-08-2014  
A2  5   26-08-2014  
A2  5   30-09-2014  
A3  2   16-08-2014  
A3  3   17-09-2014  
A4  5   08-06-2014  
A4  5   29-06-2014  
A4  6   20-08-2014  
A4  6   24-09-2014  
A4  4   22-10-2014  
A4  4   25-10-2014   

The data set should look like this:

PatientID   DrugCode    Prescriptiondate    IndexDate  
A1  3   07-08-2014  30-09-2014  
A1  3   08-09-2014  30-09-2014  
A1  7   19-09-2014  30-09-2014  
A1  5   30-09-2014  30-09-2014  
A2  4   11-07-2014  26-08-2014  
A2  4   21-07-2014  26-08-2014  
A2  3   13-08-2014  26-08-2014  
A2  5   26-08-2014  26-08-2014  
A2  5   30-09-2014  26-08-2014  
A3  2   16-08-2014  NA  
A3  3   17-09-2014  NA  
A4  5   08-06-2014  22-10-2014  
A4  5   29-06-2014  22-10-2014  
A4  6   20-08-2014  22-10-2014  
A4  6   24-09-2014  22-10-2014  
A4  4   22-10-2014  22-10-2014  
A4  4   25-10-2014  22-10-2014 

In the above case,patient A1 & A2 changed the drug third time to drug 5 on 30-09-2014 and 26-08-2014 respectively;A3 have not changed drug third time and A4 has changed to drug 4 on 22-10-2014, so the index date should be 30-09-2014,26-08-2014,NA,22-10-2014 respectively.

Please if anyone can assist in writing the code for such problem.

Upvotes: 1

Views: 51

Answers (2)

bgoldst
bgoldst

Reputation: 35324

Here's a base R solution, shamelessly stealing Pierre Lafortune's brilliant match-unique idea:

df <- data.frame(PatientID=c('A1','A1','A1','A1','A2','A2','A2','A2','A2','A3','A3','A4','A4','A4','A4','A4','A4'),DrugCode=c(3,3,7,5,4,4,3,5,5,2,3,5,5,6,6,4,4),Prescriptiondate=as.Date(c('07-08-2014','08-09-2014','19-09-2014','30-09-2014','11-07-2014','21-07-2014','13-08-2014','26-08-2014','30-09-2014','16-08-2014','17-09-2014','08-06-2014','29-06-2014','20-08-2014','24-09-2014','22-10-2014','25-10-2014'),'%d-%m-%Y'));
df$IndexDate <- do.call('c',by(df,df$PatientID,function(g) rep(g$Prescriptiondate[match(unique(g$DrugCode)[3],g$DrugCode)],nrow(g))));
df;
##    PatientID DrugCode Prescriptiondate  IndexDate
## 1         A1        3       2014-08-07 2014-09-30
## 2         A1        3       2014-09-08 2014-09-30
## 3         A1        7       2014-09-19 2014-09-30
## 4         A1        5       2014-09-30 2014-09-30
## 5         A2        4       2014-07-11 2014-08-26
## 6         A2        4       2014-07-21 2014-08-26
## 7         A2        3       2014-08-13 2014-08-26
## 8         A2        5       2014-08-26 2014-08-26
## 9         A2        5       2014-09-30 2014-08-26
## 10        A3        2       2014-08-16       <NA>
## 11        A3        3       2014-09-17       <NA>
## 12        A4        5       2014-06-08 2014-10-22
## 13        A4        5       2014-06-29 2014-10-22
## 14        A4        6       2014-08-20 2014-10-22
## 15        A4        6       2014-09-24 2014-10-22
## 16        A4        4       2014-10-22 2014-10-22
## 17        A4        4       2014-10-25 2014-10-22

Upvotes: 0

Pierre L
Pierre L

Reputation: 28461

This is a possible dplyr solution:

df %>% group_by(PatientID) %>% mutate(IndexDate = Prescriptiondate[match(unique(DrugCode)[3], DrugCode)])
# Source: local data frame [17 x 4]
# Groups: PatientID
# 
#    PatientID DrugCode Prescriptiondate  IndexDate
# 1         A1        3       07-08-2014 30-09-2014
# 2         A1        3       08-09-2014 30-09-2014
# 3         A1        7       19-09-2014 30-09-2014
# 4         A1        5       30-09-2014 30-09-2014
# 5         A2        4       11-07-2014 26-08-2014
# 6         A2        4       21-07-2014 26-08-2014
# 7         A2        3       13-08-2014 26-08-2014
# 8         A2        5       26-08-2014 26-08-2014
# 9         A2        5       30-09-2014 26-08-2014
# 10        A3        2       16-08-2014         NA
# 11        A3        3       17-09-2014         NA
# 12        A4        5       08-06-2014 22-10-2014
# 13        A4        5       29-06-2014 22-10-2014
# 14        A4        6       20-08-2014 22-10-2014
# 15        A4        6       24-09-2014 22-10-2014
# 16        A4        4       22-10-2014 22-10-2014
# 17        A4        4       25-10-2014 22-10-2014

I guess it's the same idea with data.table

dt[, IndexDate := Prescriptiondate[match(unique(DrugCode)[3], DrugCode)], PatientID]
#    PatientID DrugCode Prescriptiondate  IndexDate
#  1:        A1        3       07-08-2014 30-09-2014
#  2:        A1        3       08-09-2014 30-09-2014
#  3:        A1        7       19-09-2014 30-09-2014
#  4:        A1        5       30-09-2014 30-09-2014
#  5:        A2        4       11-07-2014 26-08-2014
#  6:        A2        4       21-07-2014 26-08-2014
#  7:        A2        3       13-08-2014 26-08-2014
#  8:        A2        5       26-08-2014 26-08-2014
#  9:        A2        5       30-09-2014 26-08-2014
# 10:        A3        2       16-08-2014         NA
# 11:        A3        3       17-09-2014         NA
# 12:        A4        5       08-06-2014 22-10-2014
# 13:        A4        5       29-06-2014 22-10-2014
# 14:        A4        6       20-08-2014 22-10-2014
# 15:        A4        6       24-09-2014 22-10-2014
# 16:        A4        4       22-10-2014 22-10-2014
# 17:        A4        4       25-10-2014 22-10-2014

match works because it stops once it finds a match. So if a drug is used over many days or one, it will not change the outcome. We look for the first instance of the DrugCode changing for the third time. unique works because it arranges it's values in the order that they appear. So unique(x)[3] will give the third change in that value.

Upvotes: 1

Related Questions