Reputation: 4206
I'd like to assign a value to a variable, then use that variable to create a new variable. The syntax for data.table
supports multiple assignment, but apparently not with internal references. The "i" and "by" clauses in my real use-case are more complicated, so I'd prefer not to have repeating code like this:
require(data.table)
dt <- data.table(
x = 1:5,
y = 2:6
)
# this works
dt[x == 3, z1 := x + y]
dt[x == 3, z2 := z1 + 5]
# but I wish this worked
dt[x == 3, `:=`(
z1 = x + y,
z2 = z1 + 5
)]
In contrast, this works in dplyr:
require(dplyr)
df <- data.frame(
x = 1:5,
y = 2:6
)
df <- mutate(df,
z1 = x + y,
z2 = z1 + 5
)
Is there a clean way to do this using data.table?
EDIT: Tweaking akrun's solution slightly, I figured out a way to keep the readable, sequential syntax I was looking for. It's just doing all of the operations outside the list:
dt[x==3, c('z1','z2','z3') := {
z1 <- x+y
z2 <- z1 + 5
z3 <- z2 + 6
list(z1, z2, z3)
}]
Upvotes: 4
Views: 488
Reputation: 886938
We can use curly brackets to create the temporary variables, then place them in a list
along with the calculation based on that variable, assign (:=
) to the columns we need to create.
dt[x==3, c('z1', 'z2') := {
z1 <- x+y
list(z1, z1+5)
}]
dt
# x y z1 z2
#1: 1 2 NA NA
#2: 2 3 NA NA
#3: 3 4 7 12
#4: 4 5 NA NA
#5: 5 6 NA NA
To make it a bit more faster, we can use setkey
setkey(dt, x)[(3), c('z1', 'z2') := {
z1 <- x+y
list(z1, z1+5)
}]
set.seed(24)
dt1 <- data.table(x = sample(1:9, 1e8, replace=TRUE), y = sample(5:9, 1e8, replace=TRUE))
dt2 <- copy(dt1)
dt3 <- copy(dt1)
akrun1 <- function(){dt1[x==3, c('z1', 'z2') := {
z1 <- x+y
list(z1, z1+5)
}]
}
akrun2 <- function() {setkey(dt3, x)[(3), c('z1', 'z2') := {
z1 <- x+y
list(z1, z1+5)
}]
}
rsoren <- function() {
dt2[x == 3, z1 := x + y]
dt2[x == 3, z2 := z1 + 5]
}
library(microbenchmark)
microbenchmark(akrun1(), akrun2(), rsoren(), unit= "relative", times = 20L)
#Unit: relative
# expr min lq mean median uq max neval
# akrun1() 1.597267 1.605404 1.393016 1.642584 1.538929 0.8634406 20
# akrun2() 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000 20
# rsoren() 2.584153 2.586185 2.179601 2.694469 2.468219 0.9740701 20
Upvotes: 4