wasyl
wasyl

Reputation: 65

Apply function to data.table using function's character name and arguments as character vector

I would like to call functions by their character name on a data.table. Each function has also a vector of arguments (so there is a long list of functions to apply to data.table). Arguments are data.table columns. My first thought was that do.call would be a good approach for that task. Here is a simple example with one function name to run and it's vector of columns to pass:

# set up dummy data 
set.seed(1)
DT <- data.table(x = rep(c("a","b"),each=5), y = sample(10), z = sample(10))
# columns to use as function arguments
mycols <- c('y','z')
# function name 
func <- 'sum'
# my current solution:
DT[, do.call(func, list(get('y'), get('z'))), by = x]
#    x V1
# 1: a 47
# 2: b 63  

I am not satisfied with that since it requires to name specifically each column. And I would like to pass just a character vector mycols.

Other solution that works just as I need in this case is:

DT[, do.call(func, .SD), .SDcols = mycols, by = x]

But there is a hiccup with custom functions and the only solution that works for me is the first one:

#own dummy function    
myfunc <- function(arg1, arg2){
  arg1+arg2
}
func <- 'myfunc'
DT[, do.call(func, list(get('y'), get('z'))), by = x] 
#   x V1
#  1: a  6
#  2: a  6
#  3: a 11
#  4: a 17
#  5: a  7
#  6: b 15
#  7: b 17
#  8: b 10
#  9: b 11
# 10: b 10
# second solution does not work 
DT[, do.call(func, .SD), .SDcols = mycols, by = x]
# Error in myfunc(y = c(3L, 4L, 5L, 7L, 2L), z = c(3L, 2L, 6L, 10L, 5L)) : 
#  unused arguments (y = c(3, 4, 5, 7, 2), z = c(3, 2, 6, 10, 5))

As I understand it, it assumes that myfunc has arguments y, z which is not true. There should be variables y,z which should be passed to arguments arg1, arg2.

I also tried mget function, but also with no success:

DT[, do.call(func, mget(mycols)), by = x] 
# Error: value for ‘y’ not found

I could be missing something fairly obvious, thanks in advance for any guidance.

Upvotes: 4

Views: 1030

Answers (3)

Alex
Alex

Reputation: 15708

Yes you are missing something (well, it's not really obvious, but careful debugging of the error identifies the problem). Your function expects named arguments arg1 and arg2. You are passing it arguments y = ... and z = ... via do.call (which you have noticed). The solution is to pass the list without names:

> DT[, do.call(func, unname(.SD[, mycols, with = F])), by = x]
    x V1
 1: a  6
 2: a  6
 3: a 11
 4: a 17
 5: a  7
 6: b 15
 7: b 17
 8: b 10
 9: b 11
10: b 10

Upvotes: 1

wasyl
wasyl

Reputation: 65

Here is a solution that helped me to achieve what I want.

func <- 'sum'
mycols <- c('y','z')
DT[, do.call(func, lapply(mycols, function(x) get(x))), by = x]
#    x V1
# 1: a 47
# 2: b 63

One can pass to it base functions or custom defined functions (not so specific as with Reduce solution).

Upvotes: 0

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

This is likely to be dependent on the types of functions you want to use, but it seems like Reduce might be of interest to you.

Here it is with both of your examples:

mycols <- c('y','z')
func <- 'sum'

DT[, Reduce(func, mget(mycols)), by = x]
#    x V1
# 1: a 47
# 2: b 63

myfunc <- function(arg1, arg2){
  arg1+arg2
}
func <- 'myfunc'

DT[, Reduce(func, mget(mycols)), by = x]
#     x V1
#  1: a  6
#  2: a  6
#  3: a 11
#  4: a 17
#  5: a  7
#  6: b 15
#  7: b 17
#  8: b 10
#  9: b 11
# 10: b 10

Upvotes: 2

Related Questions