John Horton
John Horton

Reputation: 4272

Avoid wrapping a function argument in "quote()" when using data.table in a function

I have a function, create.summary that, when passed a column name, summarizes that value of that column by year and month. Note the use of eval() in the j expression of the data table.

create.summary <- function(full.panel.df, outcome.name){
    df.apps <- data.table(full.panel.df)[, list(
                                        Y = mean(eval(outcome.name)),
                                        se = sd(eval(outcome.name))/sqrt(.N)
                                        ),
                                by = list(month, year, trt)]
    return df.apps
}

For this to work, I need to call this function with the column name quoted, like so: create.summary(df, quote(hourly_earnings))

but this is pain and will confuse my users---I'd rather have users be able to call this function with the column name as a string: create.summary(df, "hourly_earnings")

I'm guessing there is some combination of deparse, eval, substitute etc. that will make this work, but I can't figure it out and I'm just trying things more or less at random.

Upvotes: 2

Views: 126

Answers (3)

npjc
npjc

Reputation: 4194

For my ( and hopefully others) sake I've lined up my answers and @GSee , @BrodieG answers according to different behaviors. At the least I've found this comparison useful.

for: create.summary(df, hourly_earnings)

  • change eval to evalq, definitely seems simplest in this case.

    from the help file:

    "The evalq form is equivalent to eval(quote(expr), ...). eval evaluates its first argument in the current scope before passing it to the evaluator: evalq avoids this."

Your function becomes:

create.summary <- function(full.panel.df, outcome.name){
    df.apps <- data.table(full.panel.df)[, list(
                                        Y = mean(evalq(outcome.name)),
                                        se = sd(evalq(outcome.name))/sqrt(.N)
                                        ),
                                by = list(month, year, trt)]
    return df.apps
}
  • using substitute() and get():

Your function becomes:

create.summary <- function(full.panel.df, outcome.name){
  out.name.quoted <- as.character(substitute(outcome.name))
  df.apps <- data.table(full.panel.df)[, list(
    Y = mean(get(out.name.quoted)),
    se = sd(get(out.name.quoted))/sqrt(.N)
    ),
    by = list(month, year, trt)
  ]
  df.apps
}

for: create.summary(df, "hourly_earnings")

  • get() searches for object of that name; it is safer than parse(text=)

Your function becomes:

create.summary <- function(full.panel.df, outcome.name){
    df.apps <- data.table(full.panel.df)[, list(
                                        Y = mean(get(outcome.name)),
                                        se = sd(get(outcome.name))/sqrt(.N)
                                        ),
                                by = list(month, year, trt)]
    return df.apps
}
  • parse(text=) which is useful for synthesizing expressions / reading from file.

Your function becomes:

create.summary <- function(full.panel.df, outcome.name){
    df.apps <- data.table(full.panel.df)[, list(
                                        Y = mean(eval(parse(text=outcome.name))),
                                        se = sd(eval(parse(text=outcome.name)))/sqrt(.N)
                                        ),
                                by = list(month, year, trt)]
    return df.apps
}

Upvotes: 2

BrodieG
BrodieG

Reputation: 52637

And another using substitute and get:

create.summary <- function(full.panel.df, outcome.name){
  out.name.quoted <- as.character(substitute(outcome.name))
  df.apps <- data.table(full.panel.df)[, list(
    Y = mean(get(out.name.quoted)),
    se = sd(get(out.name.quoted))/sqrt(.N)
    ),
    by = list(month, year, trt)
  ]
  df.apps
}

Usage:

create.summary(df, a)

With some data:

df <- data.frame(month=month.abb, year=rep(2000:2005, each=24), trt=c("one", "two"), a=runif(6 * 12), b=runif(6 * 12))

Upvotes: 1

GSee
GSee

Reputation: 49810

Try using get instead of eval

create.summary <- function(full.panel.df, outcome.name){
    df.apps <- data.table(full.panel.df)[, list(
                                        Y = mean(get(outcome.name)),
                                        se = sd(get(outcome.name))/sqrt(.N)
                                        ),
                                by = list(month, year, trt)]
    return df.apps
}

Here's a reproducible example:

foo <- function(x, n) {
  data.table(x)[, list(Y=mean(get(n)),
                       se=sd(get(n))/sqrt(.N)),
                by=list(cyl, am)]
}

foo(mtcars, "wt")
#    cyl am        Y         se
# 1:   6  1 2.755000 0.07399324
# 2:   4  1 2.042250 0.14472656
# 3:   6  0 3.388750 0.05810820
# 4:   8  0 4.104083 0.22179111
# 5:   4  0 2.935000 0.23528352
# 6:   8  1 3.370000 0.20000000
foo(mtcars, "hp")
#    cyl am         Y        se
# 1:   6  1 131.66667 21.666667
# 2:   4  1  81.87500  8.009899
# 3:   6  0 115.25000  4.589390
# 4:   8  0 194.16667  9.630156
# 5:   4  0  84.66667 11.348030
# 6:   8  1 299.50000 35.500000

Upvotes: 3

Related Questions