Chris
Chris

Reputation: 313

Conditionally working with variables in data.table

This is a small challenge within a big project, so I'm going to try to keep this simple.

I'm attempting to conditionally add columns to a data.table, and then process them on a conditional basis.

x <- T
y <- data.table(a = 1:10, b = c(rep(1,5), rep(2,5)))

y[  # filter some rows
  a != 1
][  # conditionally add two calculated columns
  ,
  if(x){
    `:=` (
      c = a*b,
      d = 1/b
    )
  }
][  # process columns and group
  ,
  list(
    a = sum(a),
    b = sum(b),
    if(x) c = sum(c)  # only add c if it's created above
  ),
  by = if(x) list(b, d) else list(b)  # only group by d if it's created above
]

Here is the output (error references the second set []):

Error in eval(expr, envir, enclos) : object 'd' not found
In addition: Warning message:
In deconstruct_and_eval(m, envir, enclos) :
  Caught and removed `{` wrapped around := in j. := and `:=`(...) are 
                defined for use in j, once only and in particular ways. See help(":=").

Of course, the error is a symptom of the warning. How can I get this done?


As @Michal pointed out, putting the if() statement outside the data.table call is an option:

if(x) {
  y[
   ...
  ]
} else {
  y[
   ...
  ]
}

I'm hoping there's a way to get this done without repeating the code in its entirety, to simplify everything.

Upvotes: 2

Views: 207

Answers (1)

eddi
eddi

Reputation: 49448

I can't think of a way of doing it inside the j-expression, because of how := gets evaluated in there (it really only works if it's at the root of the expression tree), but you could put it in the i-expression as a workaround:

x = FALSE
y[a != 1][x, `:=`(c = a * b, d = 1/b)][]
#    a b
#1:  2 1
#2:  3 1
#3:  4 1
#4:  5 1
#5:  6 2
#6:  7 2
#7:  8 2
#8:  9 2
#9: 10 2

x = TRUE
y[a != 1][x, `:=`(c = a * b, d = 1/b)][]
#    a b  c   d
#1:  2 1  2 1.0
#2:  3 1  3 1.0
#3:  4 1  4 1.0
#4:  5 1  5 1.0
#5:  6 2 12 0.5
#6:  7 2 14 0.5
#7:  8 2 16 0.5
#8:  9 2 18 0.5
#9: 10 2 20 0.5

Since c(1) is the same as c(1, NULL), it can be used to return complete vectors when you're not sure how many elements will compose them.

To conditionally include columns in j

y[
  ,
  c(
    list(
      a = sum(a), 
      b = sum(b)
    ), 
    if(x) list(c = sum(c))
  )
]

And to conditionally include columns in by

y[
  ,
  ...,
  by = c("b", if(x) "d")
]

by won't accept a vector of lists, but it will accept a vector of column names.

Upvotes: 2

Related Questions