Abraham Mathew
Abraham Mathew

Reputation: 81

Creating lags in the set function by group

I'm using the data table package in R. I am trying to create a bunch of lagged variables using the set function in data.table.

Here is an example that works perfectly.

DT <- data.table(
        id = sample(c("US", "Other"), 25, replace = TRUE), 
        loc = sample(LETTERS[1:5], 25, replace = TRUE), 
        index = runif(25)
        )
DT
    
ALL_FEATURES="index"
LAG_VALS=1:2
for(each_var in ALL_FEATURES){
    for(each_lag in LAG_VALS){
            set(DT, 
                j = eval(paste0(each_var,"_lag_",each_lag)), 
                value = shift(DT[[each_var]], n = each_lag, type = "lag"))
          } 
        }
DT

Ok, that is great. But what if I want to do the lags by the id column. So for each of the id values, I'd generate these lags Can I specify that in the set function?

Upvotes: 2

Views: 84

Answers (2)

jangorecki
jangorecki

Reputation: 16697

set function does not accept grouping so you need to use [. There is no need for any loops because shift is vectorized not just on n but also on x arg. I slightly extended your example by having 2 columns in x

library(data.table)
DT = data.table(
        id = sample(c("US", "Other"), 25, replace = TRUE), 
        loc = sample(LETTERS[1:5], 25, replace = TRUE), 
        index = runif(25),
        index2 = runif(25)
        )
ALL_FEATURES=c("index","index2")
LAG_VALS=1:2

cols = paste0(rep(ALL_FEATURES, each=length(LAG_VALS)),"_lag_",rep(LAG_VALS, length(ALL_FEATURES)))
DT[, (cols) := shift(.SD, n=LAG_VALS, type="lag"), by=id, .SDcols=ALL_FEATURES]

Providing column names will not be necessary once https://github.com/Rdatatable/data.table/issues/1543 will be implemented, then shift(..., give.names=TRUE) will be enough.

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 389047

I am not sure if this is possible with set but you can achieve this with a single loop since n value in shift can be more than 1.

ALL_FEATURES="index"
LAG_VALS=1:2

for(each_var in ALL_FEATURES){
    DT[, paste0(each_var,"_lag_",LAG_VALS) := shift(get(each_var), 
                                                 n = LAG_VALS, type = "lag"), id]
} 
DT

Upvotes: 0

Related Questions