Reputation: 81
I'm using the data table package in R. I am trying to create a bunch of lagged variables using the set function in data.table.
Here is an example that works perfectly.
DT <- data.table(
id = sample(c("US", "Other"), 25, replace = TRUE),
loc = sample(LETTERS[1:5], 25, replace = TRUE),
index = runif(25)
)
DT
ALL_FEATURES="index"
LAG_VALS=1:2
for(each_var in ALL_FEATURES){
for(each_lag in LAG_VALS){
set(DT,
j = eval(paste0(each_var,"_lag_",each_lag)),
value = shift(DT[[each_var]], n = each_lag, type = "lag"))
}
}
DT
Ok, that is great. But what if I want to do the lags by the id column. So for each of the id values, I'd generate these lags Can I specify that in the set function?
Upvotes: 2
Views: 84
Reputation: 16697
set
function does not accept grouping so you need to use [
.
There is no need for any loops because shift
is vectorized not just on n
but also on x
arg.
I slightly extended your example by having 2 columns in x
library(data.table)
DT = data.table(
id = sample(c("US", "Other"), 25, replace = TRUE),
loc = sample(LETTERS[1:5], 25, replace = TRUE),
index = runif(25),
index2 = runif(25)
)
ALL_FEATURES=c("index","index2")
LAG_VALS=1:2
cols = paste0(rep(ALL_FEATURES, each=length(LAG_VALS)),"_lag_",rep(LAG_VALS, length(ALL_FEATURES)))
DT[, (cols) := shift(.SD, n=LAG_VALS, type="lag"), by=id, .SDcols=ALL_FEATURES]
Providing column names will not be necessary once https://github.com/Rdatatable/data.table/issues/1543 will be implemented, then shift(..., give.names=TRUE)
will be enough.
Upvotes: 3
Reputation: 389047
I am not sure if this is possible with set
but you can achieve this with a single loop since n
value in shift
can be more than 1.
ALL_FEATURES="index"
LAG_VALS=1:2
for(each_var in ALL_FEATURES){
DT[, paste0(each_var,"_lag_",LAG_VALS) := shift(get(each_var),
n = LAG_VALS, type = "lag"), id]
}
DT
Upvotes: 0