user99
user99

Reputation: 35

How do I pass column-specific arguments to lapply in data.table .SD?

I have seen examples of using .SDwith lapply in data.table with a simple function as below:

DT[ , .(b,d,e) := lapply(.SD, tan), .SDcols = .(b,d,e)]

But I'm unsure of how to use column-specific arguments in a multiple argument function. For instance I have a winsorize function, I want to apply it to a subset of columns in a data table but using column-specific percentiles, e.g.

library(DescTools)
wlevel <- list(b=list(lower=0.01,upper=0.99), c=list(upper=0.02,upper=0.95))
DT[ , .(b,c) :=lapply(.SD, function(x) 
{winsorize(x,wlevel$zzz$lower,wlevel$zzz$upper)}), .SDcols = .(b,c)]

Where zzz will be the respective column to iterate. I have also seen threads on using changing arguments with lapply but not in the context of data table with .SDcols

Is this possible to do?

This is a toy example, looking to generalize for the case of arbitrary large number of columns; Looping is always an option but trying to see if there's a more elegant/efficient solution...

Upvotes: 3

Views: 1390

Answers (1)

smci
smci

Reputation: 33938

How to use column-specific arguments in a multiple argument function?

Use mapply(FUN, dat, params1, params2, ...) where each of params1, params2, ... can be a list or vector; mapply iterates over each of dat, params1, params2, ... in parallel.

Note that unlike the rest of the apply/lapply/sapply family, with mapply the function argument comes first, then the data and parameter(s).

In your case (pseudo-code, you'll need to tweak it to get it to run) something like:

Instead of your nested list wlevel <- list(b=list(lower=0.01,upper=0.99), c=list(upper=0.02,upper=0.95)), probably easier to unpack to:

w_lower <- list(b=0.01, c=0.02)
w_upper <- list(b=0.99, c=0.95) 

DT[ , c('b','c') := mapply(function(x, w_lower_col, w_upper_col) { winsorize(x, w_lower_col, w_upper_col) },
  .SD, w_lower, w_upper), .SDcols = c('b', 'c')]

We shouldn't need to use column-names (your zzz) in indexing into the list, mapply() should just iterate over the list as-is.

Upvotes: 1

Related Questions