rafa.pereira
rafa.pereira

Reputation: 13817

Pass strings as code to summarize multiple columns with data.table

We would like to summarize a data table to create a lot of new variables that result from the combination of columns names and values from the original data. Here is reproducile example illustrating the result we would like to achieve with two columns only for the sake of brevity

library(data.table)
data('mtcars')
setDT(mtcars)

# Desired output
mtcars[, .(
           acm_hp_carb2 = mean(hp[which( carb <= 2)], na.rm=T),
           acm_wt_am1 = mean(wt[which( am== 1)], na.rm=T)
  ), by= .(cyl, gear)]

Because we want to summarize a lot of columns, we created a function that returns all the strings that we would use to create each summary variable. In this example, we have this:

a <- 'acm_hp_carb2 = mean(hp[which( carb <= 2)], na.rm=T)'
b <- 'acm_wt_am1 = mean(wt[which( am== 1)], na.rm=T)'

And here is our failed attempt. Note that the new columns created do not receive the names we want to assign to them.

mtcars[, .(  
           eval(parse(text=a)),
           eval(parse(text=b))

          ), by= .(cyl, gear)]

Upvotes: 3

Views: 256

Answers (2)

Frank
Frank

Reputation: 66819

You can make one call and eval it:

f = function(...){
  ex = parse(text = sprintf(".(%s)", paste(..., sep=", ")))[[1]]
  print(ex)
  mtcars[, eval(ex), by=.(cyl, gear)]
}

f(a,b)

a2 <- 'acm_hp_carb2 = mean(hp[carb <= 2], na.rm=T)'
b2 <- 'acm_wt_am1 = mean(wt[am == 1], na.rm=T)'

f(a2, b2)

I guess the which() is not needed.

Upvotes: 1

IceCreamToucan
IceCreamToucan

Reputation: 28695

Seems like the only part which isn't working is the column names. If you put a and b in a vector and add names to them, you can use lapply to do the eval(parse and keep the names from the vector. I used regex to get the names, but presumably in the real code you can assign the names as whatever variable you're using to construct the strings in the first place.

Result has many NaNs but it matches your desired output.

to_make <- c(a, b)
to_make <- setNames(to_make, sub('^(.*) =.*', '\\1', to_make))

mtcars2[, lapply(to_make, function(x) eval(parse(text = x)))
          , by= .(cyl, gear)]

#    cyl gear acm_hp_carb2 acm_wt_am1
# 1:   6    4          NaN   2.747500
# 2:   4    4         76.0   2.114167
# 3:   6    3        107.5        NaN
# 4:   8    3        162.5        NaN
# 5:   4    3         97.0        NaN
# 6:   4    5        102.0   1.826500
# 7:   8    5          NaN   3.370000
# 8:   6    5          NaN   2.770000

Upvotes: 3

Related Questions