Reputation: 93
I have a dataset of 100 observations which contain patient id,drugcode,prescription date. I want to create a new column "index date" which is the date when the patient changed drug for the third time.
PatientID DrugCode Prescriptiondate
A1 3 07-08-2014
A1 3 08-09-2014
A1 7 19-09-2014
A1 5 30-09-2014
A2 4 11-07-2014
A2 4 21-07-2014
A2 3 13-08-2014
A2 5 26-08-2014
A2 5 30-09-2014
A3 2 16-08-2014
A3 3 17-09-2014
A4 5 08-06-2014
A4 5 29-06-2014
A4 6 20-08-2014
A4 6 24-09-2014
A4 4 22-10-2014
A4 4 25-10-2014
The data set should look like this:
PatientID DrugCode Prescriptiondate IndexDate
A1 3 07-08-2014 30-09-2014
A1 3 08-09-2014 30-09-2014
A1 7 19-09-2014 30-09-2014
A1 5 30-09-2014 30-09-2014
A2 4 11-07-2014 26-08-2014
A2 4 21-07-2014 26-08-2014
A2 3 13-08-2014 26-08-2014
A2 5 26-08-2014 26-08-2014
A2 5 30-09-2014 26-08-2014
A3 2 16-08-2014 NA
A3 3 17-09-2014 NA
A4 5 08-06-2014 22-10-2014
A4 5 29-06-2014 22-10-2014
A4 6 20-08-2014 22-10-2014
A4 6 24-09-2014 22-10-2014
A4 4 22-10-2014 22-10-2014
A4 4 25-10-2014 22-10-2014
In the above case,patient A1 & A2 changed the drug third time to drug 5 on 30-09-2014 and 26-08-2014 respectively;A3 have not changed drug third time and A4 has changed to drug 4 on 22-10-2014, so the index date should be 30-09-2014,26-08-2014,NA,22-10-2014 respectively.
Please if anyone can assist in writing the code for such problem.
Upvotes: 1
Views: 51
Reputation: 35324
Here's a base R solution, shamelessly stealing Pierre Lafortune's brilliant match-unique idea:
df <- data.frame(PatientID=c('A1','A1','A1','A1','A2','A2','A2','A2','A2','A3','A3','A4','A4','A4','A4','A4','A4'),DrugCode=c(3,3,7,5,4,4,3,5,5,2,3,5,5,6,6,4,4),Prescriptiondate=as.Date(c('07-08-2014','08-09-2014','19-09-2014','30-09-2014','11-07-2014','21-07-2014','13-08-2014','26-08-2014','30-09-2014','16-08-2014','17-09-2014','08-06-2014','29-06-2014','20-08-2014','24-09-2014','22-10-2014','25-10-2014'),'%d-%m-%Y'));
df$IndexDate <- do.call('c',by(df,df$PatientID,function(g) rep(g$Prescriptiondate[match(unique(g$DrugCode)[3],g$DrugCode)],nrow(g))));
df;
## PatientID DrugCode Prescriptiondate IndexDate
## 1 A1 3 2014-08-07 2014-09-30
## 2 A1 3 2014-09-08 2014-09-30
## 3 A1 7 2014-09-19 2014-09-30
## 4 A1 5 2014-09-30 2014-09-30
## 5 A2 4 2014-07-11 2014-08-26
## 6 A2 4 2014-07-21 2014-08-26
## 7 A2 3 2014-08-13 2014-08-26
## 8 A2 5 2014-08-26 2014-08-26
## 9 A2 5 2014-09-30 2014-08-26
## 10 A3 2 2014-08-16 <NA>
## 11 A3 3 2014-09-17 <NA>
## 12 A4 5 2014-06-08 2014-10-22
## 13 A4 5 2014-06-29 2014-10-22
## 14 A4 6 2014-08-20 2014-10-22
## 15 A4 6 2014-09-24 2014-10-22
## 16 A4 4 2014-10-22 2014-10-22
## 17 A4 4 2014-10-25 2014-10-22
Upvotes: 0
Reputation: 28461
This is a possible dplyr
solution:
df %>% group_by(PatientID) %>% mutate(IndexDate = Prescriptiondate[match(unique(DrugCode)[3], DrugCode)])
# Source: local data frame [17 x 4]
# Groups: PatientID
#
# PatientID DrugCode Prescriptiondate IndexDate
# 1 A1 3 07-08-2014 30-09-2014
# 2 A1 3 08-09-2014 30-09-2014
# 3 A1 7 19-09-2014 30-09-2014
# 4 A1 5 30-09-2014 30-09-2014
# 5 A2 4 11-07-2014 26-08-2014
# 6 A2 4 21-07-2014 26-08-2014
# 7 A2 3 13-08-2014 26-08-2014
# 8 A2 5 26-08-2014 26-08-2014
# 9 A2 5 30-09-2014 26-08-2014
# 10 A3 2 16-08-2014 NA
# 11 A3 3 17-09-2014 NA
# 12 A4 5 08-06-2014 22-10-2014
# 13 A4 5 29-06-2014 22-10-2014
# 14 A4 6 20-08-2014 22-10-2014
# 15 A4 6 24-09-2014 22-10-2014
# 16 A4 4 22-10-2014 22-10-2014
# 17 A4 4 25-10-2014 22-10-2014
I guess it's the same idea with data.table
dt[, IndexDate := Prescriptiondate[match(unique(DrugCode)[3], DrugCode)], PatientID]
# PatientID DrugCode Prescriptiondate IndexDate
# 1: A1 3 07-08-2014 30-09-2014
# 2: A1 3 08-09-2014 30-09-2014
# 3: A1 7 19-09-2014 30-09-2014
# 4: A1 5 30-09-2014 30-09-2014
# 5: A2 4 11-07-2014 26-08-2014
# 6: A2 4 21-07-2014 26-08-2014
# 7: A2 3 13-08-2014 26-08-2014
# 8: A2 5 26-08-2014 26-08-2014
# 9: A2 5 30-09-2014 26-08-2014
# 10: A3 2 16-08-2014 NA
# 11: A3 3 17-09-2014 NA
# 12: A4 5 08-06-2014 22-10-2014
# 13: A4 5 29-06-2014 22-10-2014
# 14: A4 6 20-08-2014 22-10-2014
# 15: A4 6 24-09-2014 22-10-2014
# 16: A4 4 22-10-2014 22-10-2014
# 17: A4 4 25-10-2014 22-10-2014
match
works because it stops once it finds a match. So if a drug is used over many days or one, it will not change the outcome. We look for the first instance of the DrugCode changing for the third time. unique
works because it arranges it's values in the order that they appear. So unique(x)[3]
will give the third change in that value.
Upvotes: 1