rafa.pereira
rafa.pereira

Reputation: 13807

Using rlang double curly braces {{ in data.table

Problem

The {{}} operator from the rlang package makes it incredibly easy to pass column names as function arguments (aka Quasiquotation). I understand rlang is intended to work with tidyverse, but is there a way to use {{}} in data.table?

Intended use of {{}} with dplyr

test_dplyr <- function(dt, col1, col2){
  
  temp <- dt %>%
            group_by( {{col2}} ) %>%
            summarise(test = mean( {{col1}} ))

  return(temp)
}

test_dplyr(dt=iris, col1=Sepal.Length, col2=Species)

> # A tibble: 3 x 2
>   Species     test
>   <fct>      <dbl>
> 1 setosa      5.01
> 2 versicolor  5.94
> 3 virginica   6.59

Failed attempt of using {{}} with data.table

This is ideally what I would like to do, but it returns an ERROR.

test_dt2 <- function(dt, col1, col2){
  
  data.table::setDT(dt)
  temp <- dt[, .( test = mean({{col1}})), by = {{col2}} ] )
  return(temp)
}

# error
test_dt2(dt=iris, col1= Sepal.Length, col2= Species)

# and error
test_dt2(dt=iris, col1= 'Sepal.Length', col2= 'Species')

Alternative use of rlang with data.table

And here is an alternative way to use rlang with data.table. There are two inconvinences here, which are to rlang::ensym() every column name variable, and having to call data.table operations inside rlang::injec().

test_dt <- function(dt, col1, col2){
  
  # eval colnames
  col1 <- rlang::ensym(col1)
  col2 <- rlang::ensym(col2)
  
  data.table::setDT(dt)
  temp <- rlang::inject( dt[, .( test = mean(!!col1)), by = !!col2] )
  return(temp)
}

test_dt(dt=iris, col1='Sepal.Length', col2='Species')

>       Species  test
> 1:     setosa 5.006
> 2: versicolor 5.936
> 3:  virginica 6.588

Upvotes: 3

Views: 502

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269291

I don't think you want to use rlang with data.table. data.table already has more convenient facilities itself. Also suggest not using setDT here as that will result in the side effect of changing dt in place.

library(data.table)

test_dt <- function(dt, col1, col2) {
  as.data.table(dt)[, .( test = mean(.SD[[col1]])), by = c(col2)]
}

test_dt(dt = iris, col1 = 'Sepal.Length', col2 = 'Species')
##       Species  test
## 1:     setosa 5.006
## 2: versicolor 5.936
## 3:  virginica 6.588

This also works:

test_dt <- function(dt, col1, col2) {
  as.data.table(dt)[, .( test = mean(get(col1))), by = c(col2)]
}

test_dt(dt=iris, col1='Sepal.Length', col2='Species')

Upvotes: 6

Related Questions