How to extract date based on condition over two different variables in R

Question

I have a dataset of 100 observations which contain patient id,drugcode,prescription date. I want to create a new column "index date" which is the date when the patient changed drug for the third time.

PatientID   DrugCode    Prescriptiondate   
A1  3   07-08-2014   
A1  3   08-09-2014   
A1  7   19-09-2014   
A1  5   30-09-2014  
A2  4   11-07-2014  
A2  4   21-07-2014  
A2  3   13-08-2014  
A2  5   26-08-2014  
A2  5   30-09-2014  
A3  2   16-08-2014  
A3  3   17-09-2014  
A4  5   08-06-2014  
A4  5   29-06-2014  
A4  6   20-08-2014  
A4  6   24-09-2014  
A4  4   22-10-2014  
A4  4   25-10-2014

The data set should look like this:

PatientID   DrugCode    Prescriptiondate    IndexDate  
A1  3   07-08-2014  30-09-2014  
A1  3   08-09-2014  30-09-2014  
A1  7   19-09-2014  30-09-2014  
A1  5   30-09-2014  30-09-2014  
A2  4   11-07-2014  26-08-2014  
A2  4   21-07-2014  26-08-2014  
A2  3   13-08-2014  26-08-2014  
A2  5   26-08-2014  26-08-2014  
A2  5   30-09-2014  26-08-2014  
A3  2   16-08-2014  NA  
A3  3   17-09-2014  NA  
A4  5   08-06-2014  22-10-2014  
A4  5   29-06-2014  22-10-2014  
A4  6   20-08-2014  22-10-2014  
A4  6   24-09-2014  22-10-2014  
A4  4   22-10-2014  22-10-2014  
A4  4   25-10-2014  22-10-2014

In the above case,patient A1 & A2 changed the drug third time to drug 5 on 30-09-2014 and 26-08-2014 respectively;A3 have not changed drug third time and A4 has changed to drug 4 on 22-10-2014, so the index date should be 30-09-2014,26-08-2014,NA,22-10-2014 respectively.

Please if anyone can assist in writing the code for such problem.

bgoldst · Accepted Answer

Here's a base R solution, shamelessly stealing Pierre Lafortune's brilliant match-unique idea:

df <- data.frame(PatientID=c('A1','A1','A1','A1','A2','A2','A2','A2','A2','A3','A3','A4','A4','A4','A4','A4','A4'),DrugCode=c(3,3,7,5,4,4,3,5,5,2,3,5,5,6,6,4,4),Prescriptiondate=as.Date(c('07-08-2014','08-09-2014','19-09-2014','30-09-2014','11-07-2014','21-07-2014','13-08-2014','26-08-2014','30-09-2014','16-08-2014','17-09-2014','08-06-2014','29-06-2014','20-08-2014','24-09-2014','22-10-2014','25-10-2014'),'%d-%m-%Y'));
df$IndexDate <- do.call('c',by(df,df$PatientID,function(g) rep(g$Prescriptiondate[match(unique(g$DrugCode)[3],g$DrugCode)],nrow(g))));
df;
##    PatientID DrugCode Prescriptiondate  IndexDate
## 1         A1        3       2014-08-07 2014-09-30
## 2         A1        3       2014-09-08 2014-09-30
## 3         A1        7       2014-09-19 2014-09-30
## 4         A1        5       2014-09-30 2014-09-30
## 5         A2        4       2014-07-11 2014-08-26
## 6         A2        4       2014-07-21 2014-08-26
## 7         A2        3       2014-08-13 2014-08-26
## 8         A2        5       2014-08-26 2014-08-26
## 9         A2        5       2014-09-30 2014-08-26
## 10        A3        2       2014-08-16       
## 11        A3        3       2014-09-17       
## 12        A4        5       2014-06-08 2014-10-22
## 13        A4        5       2014-06-29 2014-10-22
## 14        A4        6       2014-08-20 2014-10-22
## 15        A4        6       2014-09-24 2014-10-22
## 16        A4        4       2014-10-22 2014-10-22
## 17        A4        4       2014-10-25 2014-10-22

How to extract date based on condition over two different variables in R

Answers (2)

Related Questions