Reputation: 151
I want to use the last non-NA value from the DISPENSED_DURATION column and add that to the DISPENSED_DATE column to get the final LAST_DATE column within each ID group.
I am currently looking at something like copy[,.SD[.N],ID] to get the last row, but can't figure out how to skip those NA's then add it back to DISPENSED_DATE.
Here is the sample code:
dt = data.table(
ID = c(1,1,1,1,1,2,2,2,2,2),
DATE = c("2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-05", "2020-01-06","2020-01-07","2020-01-08","2020-01-09","2020-01-10"),
PRESCRIBED_DATE = c("2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-05", "2020-01-06","2020-01-07","2020-01-08", NA,"2020-01-10"),
DISPENSED_DATE = c("2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-05", "2020-01-06","2020-01-07","2020-01-08", "2020-01-09","NA"),
DISPENSED_DURATION = c(5,5,5,5,5,6,6,6,6,NA)
)
ID PRESCRIBED_DATE DISPENSED_DATE DISPENSED_DURATION
1: 1 2020-01-01 2020-01-01 5
2: 1 2020-01-02 2020-01-02 5
3: 1 2020-01-03 2020-01-03 5
4: 1 2020-01-04 2020-01-04 5
5: 1 2020-01-05 2020-01-05 5
6: 2 2020-01-06 2020-01-06 6
7: 2 2020-01-07 2020-01-07 6
8: 2 2020-01-08 2020-01-08 6
9: 2 2020-01-09 2020-01-09 6
10: 2 2020-01-10 <NA> NA
Expected result:
ID PRESCRIBED_DATE DISPENSED_DATE DISPENSED_DURATION LAST_DATE
1: 1 2020-01-01 2020-01-01 5 <NA>
2: 1 2020-01-02 2020-01-02 5 <NA>
3: 1 2020-01-03 2020-01-03 5 <NA>
4: 1 2020-01-04 2020-01-04 5 <NA>
5: 1 2020-01-05 2020-01-05 5 2020-01-10
6: 2 2020-01-06 2020-01-06 6 <NA>
7: 2 2020-01-07 2020-01-07 6 <NA>
8: 2 2020-01-08 2020-01-08 6 <NA>
9: 2 2020-01-09 2020-01-09 6 2020-01-15
10: 2 2020-01-10 <NA> NA <NA>
Thank you!
Upvotes: 0
Views: 43
Reputation: 2419
I get one compact solution.
NA
of DISPENSED_DURATION
LAST_DATE
with fcase
only for last row of each group of ID
dt[!is.na(DISPENSED_DURATION),
LAST_DATE:=fcase(rleidv(DATE) == .N,
as.Date(DISPENSED_DATE) + DISPENSED_DURATION),
by = ID]
Result:
# ID DATE PRESCRIBED_DATE DISPENSED_DATE DISPENSED_DURATION LAST_DATE
# 1: 1 2020-01-01 2020-01-01 2020-01-01 5 <NA>
# 2: 1 2020-01-02 2020-01-02 2020-01-02 5 <NA>
# 3: 1 2020-01-03 2020-01-03 2020-01-03 5 <NA>
# 4: 1 2020-01-04 2020-01-04 2020-01-04 5 <NA>
# 5: 1 2020-01-05 2020-01-05 2020-01-05 5 2020-01-10
# 6: 2 2020-01-06 2020-01-06 2020-01-06 6 <NA>
# 7: 2 2020-01-07 2020-01-07 2020-01-07 6 <NA>
# 8: 2 2020-01-08 2020-01-08 2020-01-08 6 <NA>
# 9: 2 2020-01-09 <NA> 2020-01-09 6 2020-01-15
#10: 2 2020-01-10 2020-01-10 NA NA <NA>
ps:
The "NA"
in DISPENSED_DATE
should be NA
.
Upvotes: 2
Reputation: 33603
# First make sure your data is properly defined, "NA" is not same as NA
# and you can't add characters
dt[, DISPENSED_DATE := as.Date(DISPENSED_DATE)]
# Now select the relevant rows and add the two columns:
dt[dt[, last(.I[!is.na(DISPENSED_DATE)]), by = ID]$V1,
LAST_DATE := DISPENSED_DATE + DISPENSED_DURATION]
# ID DATE PRESCRIBED_DATE DISPENSED_DATE DISPENSED_DURATION LAST_DATE
# 1: 1 2020-01-01 2020-01-01 2020-01-01 5 <NA>
# 2: 1 2020-01-02 2020-01-02 2020-01-02 5 <NA>
# 3: 1 2020-01-03 2020-01-03 2020-01-03 5 <NA>
# 4: 1 2020-01-04 2020-01-04 2020-01-04 5 <NA>
# 5: 1 2020-01-05 2020-01-05 2020-01-05 5 2020-01-10
# 6: 2 2020-01-06 2020-01-06 2020-01-06 6 <NA>
# 7: 2 2020-01-07 2020-01-07 2020-01-07 6 <NA>
# 8: 2 2020-01-08 2020-01-08 2020-01-08 6 <NA>
# 9: 2 2020-01-09 <NA> 2020-01-09 6 2020-01-15
# 10: 2 2020-01-10 2020-01-10 <NA> NA <NA>
Upvotes: 1
Reputation: 389235
Convert the dates to date class first, create an empty Date
column (LAST_DATE
). Add last non NA
DISPENSED_DATE
to corresponding DISPENSED_DURATION
for each ID
.
library(data.table)
dt[, (2:4) := lapply(.SD, as.Date), .SDcols = 2:4]
dt[, LAST_DATE := as.Date(NA)]
dt[, LAST_DATE := {
inds = max(which(!is.na(DISPENSED_DATE)))
LAST_DATE[inds] =DISPENSED_DATE[inds] + DISPENSED_DURATION[inds]
LAST_DATE
}, ID]
dt
# ID DATE PRESCRIBED_DATE DISPENSED_DATE DISPENSED_DURATION LAST_DATE
# 1: 1 2020-01-01 2020-01-01 2020-01-01 5 <NA>
# 2: 1 2020-01-02 2020-01-02 2020-01-02 5 <NA>
# 3: 1 2020-01-03 2020-01-03 2020-01-03 5 <NA>
# 4: 1 2020-01-04 2020-01-04 2020-01-04 5 <NA>
# 5: 1 2020-01-05 2020-01-05 2020-01-05 5 2020-01-10
# 6: 2 2020-01-06 2020-01-06 2020-01-06 6 <NA>
# 7: 2 2020-01-07 2020-01-07 2020-01-07 6 <NA>
# 8: 2 2020-01-08 2020-01-08 2020-01-08 6 <NA>
# 9: 2 2020-01-09 <NA> 2020-01-09 6 2020-01-15
#10: 2 2020-01-10 2020-01-10 <NA> NA <NA>
Upvotes: 1