Reputation: 35
I have seen examples of using .SD
with lapply
in data.table
with a simple function as below:
DT[ , .(b,d,e) := lapply(.SD, tan), .SDcols = .(b,d,e)]
But I'm unsure of how to use column-specific arguments in a multiple argument function. For instance I have a winsorize
function, I want to apply it to a subset of columns in a data table but using column-specific percentiles, e.g.
library(DescTools)
wlevel <- list(b=list(lower=0.01,upper=0.99), c=list(upper=0.02,upper=0.95))
DT[ , .(b,c) :=lapply(.SD, function(x)
{winsorize(x,wlevel$zzz$lower,wlevel$zzz$upper)}), .SDcols = .(b,c)]
Where zzz
will be the respective column to iterate. I have also seen threads on using changing arguments with lapply
but not in the context of data table with .SDcols
Is this possible to do?
This is a toy example, looking to generalize for the case of arbitrary large number of columns; Looping is always an option but trying to see if there's a more elegant/efficient solution...
Upvotes: 3
Views: 1390
Reputation: 33938
How to use column-specific arguments in a multiple argument function?
Use mapply(FUN, dat, params1, params2, ...)
where each of params1, params2, ...
can be a list or vector; mapply
iterates over each of dat, params1, params2, ...
in parallel.
Note that unlike the rest of the apply/lapply/sapply
family, with mapply
the function argument comes first, then the data and parameter(s).
In your case (pseudo-code, you'll need to tweak it to get it to run) something like:
Instead of your nested list wlevel <- list(b=list(lower=0.01,upper=0.99), c=list(upper=0.02,upper=0.95))
, probably easier to unpack to:
w_lower <- list(b=0.01, c=0.02)
w_upper <- list(b=0.99, c=0.95)
DT[ , c('b','c') := mapply(function(x, w_lower_col, w_upper_col) { winsorize(x, w_lower_col, w_upper_col) },
.SD, w_lower, w_upper), .SDcols = c('b', 'c')]
We shouldn't need to use column-names (your zzz
) in indexing into the list, mapply()
should just iterate over the list as-is.
Upvotes: 1