Reputation: 463
I have the following lines of code :
DT[flag==T, temp:=haz_1.5]
DT[, temp:= na.locf(temp, na.rm = FALSE), "pid"]
DT[agedays==61, haz_1.5_1:=temp]
I need to convert this into a function, so that it will work on a list of variables, instead of just one single one. I have recently learned how to create a function using lapply by passing through a list of columns and conditions for the creation of one set of new columns. However I'm unsure of how to do it when I'm passing through a list of columns as well as carrying through all values of a variable forward on these columns.
For instance, I can code the following :
columns<-c("haz_1.5", "waz_1.5")
new_cols <- paste(columns, "1", sep = "_")
x=61
maled_anthro[(flag==TRUE)&(agedays==x), (new_cols) := lapply(.SD, function(y) na.locf(y, na.rm=F)), .SDcols = columns]
But I am missing the na.locf step and thus am not getting the same output as the original lines of code prior to building the function. How would I incorporate the line of code which utilizes na.locf to carry forward values (DT[, temp:= na.locf(temp, na.rm = FALSE), "pid"]) into this function in a way in which all the data is wrapped up into the single function? Would this work with lapply in the same manner?
Dummy data that's similar to the data table I'm using :
DT <- data.table(pid = c(1,1,2,3,3,4,4,5,5,5),
flag = c(T,T,F,T,T,F,T,T,T,T),
agedays = c(1,61,61,51,61,23,61,1,32,61),
haz_1.5 = c(1,1,1,2,NA,1,3,2,3,4),
waz_1.5 = c(1,NA,NA,NA,NA,2,2,3,4,4))
Upvotes: 2
Views: 2059
Reputation: 42564
OP's code can be turned into an anonymous function which is applied to the selected columns
:
library(data.table)
columns <- c("haz_1.5", "waz_1.5")
new_cols <- paste0(columns, "_1")
x <- 61
DT[, (new_cols) := lapply(.SD, function(v) {
temp <- fifelse(flag, v, NA_real_)
temp <- nafill(temp, "locf")
fifelse(agedays == x, temp, NA_real_)
}), .SDcols = columns, by = pid][]
pid flag agedays haz_1.5 waz_1.5 haz_1.5_1 waz_1.5_1 1: 1 TRUE 1 1 1 NA NA 2: 1 TRUE 61 1 NA 1 1 3: 2 FALSE 61 1 NA NA NA 4: 3 TRUE 51 2 NA NA NA 5: 3 TRUE 61 NA NA 2 NA 6: 4 FALSE 23 1 2 NA NA 7: 4 TRUE 61 3 2 3 2 8: 5 TRUE 1 2 3 NA NA 9: 5 TRUE 32 3 4 NA NA 10: 5 TRUE 61 4 4 4 4
This is the same result we would get when we manually repeat OP's code for the two columns (note that it is required to clear the temp
column before assigning by reference parts of it.)
DT[(flag), temp := haz_1.5]
DT[, temp := zoo::na.locf(temp, na.rm = FALSE), by = pid]
DT[agedays == 61, haz_1.5_1 := temp]
DT[, temp := NULL]
DT[(flag), temp := waz_1.5]
DT[, temp := zoo::na.locf(temp, na.rm = FALSE), by = pid]
DT[agedays == 61, waz_1.5_1 := temp]
DT[, temp := NULL][]
pid flag agedays haz_1.5 waz_1.5 haz_1.5_1 waz_1.5_1 1: 1 TRUE 1 1 1 NA NA 2: 1 TRUE 61 1 NA 1 1 3: 2 FALSE 61 1 NA NA NA 4: 3 TRUE 51 2 NA NA NA 5: 3 TRUE 61 NA NA 2 NA 6: 4 FALSE 23 1 2 NA NA 7: 4 TRUE 61 3 2 3 2 8: 5 TRUE 1 2 3 NA NA 9: 5 TRUE 32 3 4 NA NA 10: 5 TRUE 61 4 4 4 4
pid
. In OP's code, the first and last assignments are working on the ungrouped (full) vectors (which might be somewhat more efficient, perhaps). However, the result of those assignments is independent of pid
and the result is the same.zoo::na.locf()
, data.table's nafill()
function is used (new with data.table v1.12.4, on CRAN 03 Oct 2019)DT[(flag), ...]
is equivalent to DT[flag == TRUE, ...]
fifelse()
is used instead of subsetted assign by reference, the no
parameter must be NA
to be compliant. Thus, DT[, temp := fifelse(flag, haz_1.5, NA_real_)][]
is equivalent to DT[(flag), temp := haz_1.5][]
Upvotes: 3