EngrStudent
EngrStudent

Reputation: 2022

Create lagged columns using data.table

Background:
Lets say I have this code

library(data.table)

#reproducibility
set.seed(45L)

#make table
dt <- data.table(V1=c(1L,2L),
                 V2=LETTERS[1:3],
                 V3=round(rnorm(4),4),
                 V4 = 1:12)
dt

for which I get

> dt
    V1 V2      V3 V4
 1:  1  A  0.3408  1
 2:  2  B -0.7033  2
 3:  1  C -0.3795  3
 4:  2  A -0.7460  4
 5:  1  B  0.3408  5
 6:  2  C -0.7033  6
 7:  1  A -0.3795  7
 8:  2  B -0.7460  8
 9:  1  C  0.3408  9
10:  2  A -0.7033 10
11:  1  B -0.3795 11
12:  2  C -0.7460 12

and I want to append create 10 columns of the lags of V3.

Question:
Is there a way to do it within the "data-table" paradigm.

More details:
If it were a data.frame then I could do a loop.

dt <- as.data.frame(dt)


for(i in 1:10){
     dt <- cbind(dt, shift(x = dt[, 3], 
                           n = i, 
                           fill = NA, 
                           type = "lag"))
     names(dt)[ncol(dt)] <- sprintf("lag_%06d",i)
}

dt

for which I get:

> dt
   V1 V2      V3 V4 lag_000001 lag_000002 lag_000003 lag_000004 lag_000005 lag_000006 lag_000007 lag_000008 lag_000009 lag_000010
1   1  A  0.3408  1         NA         NA         NA         NA         NA         NA         NA         NA         NA         NA
2   2  B -0.7033  2     0.3408         NA         NA         NA         NA         NA         NA         NA         NA         NA
3   1  C -0.3795  3    -0.7033     0.3408         NA         NA         NA         NA         NA         NA         NA         NA
4   2  A -0.7460  4    -0.3795    -0.7033     0.3408         NA         NA         NA         NA         NA         NA         NA
5   1  B  0.3408  5    -0.7460    -0.3795    -0.7033     0.3408         NA         NA         NA         NA         NA         NA
6   2  C -0.7033  6     0.3408    -0.7460    -0.3795    -0.7033     0.3408         NA         NA         NA         NA         NA
7   1  A -0.3795  7    -0.7033     0.3408    -0.7460    -0.3795    -0.7033     0.3408         NA         NA         NA         NA
8   2  B -0.7460  8    -0.3795    -0.7033     0.3408    -0.7460    -0.3795    -0.7033     0.3408         NA         NA         NA
9   1  C  0.3408  9    -0.7460    -0.3795    -0.7033     0.3408    -0.7460    -0.3795    -0.7033     0.3408         NA         NA
10  2  A -0.7033 10     0.3408    -0.7460    -0.3795    -0.7033     0.3408    -0.7460    -0.3795    -0.7033     0.3408         NA
11  1  B -0.3795 11    -0.7033     0.3408    -0.7460    -0.3795    -0.7033     0.3408    -0.7460    -0.3795    -0.7033     0.3408
12  2  C -0.7460 12    -0.3795    -0.7033     0.3408    -0.7460    -0.3795    -0.7033     0.3408    -0.7460    -0.3795    -0.7033

There has to be a more elegant way. Something that can handle much larger data much more quickly and efficiently.

Now what if I wanted to first do this, to make 10 lags, on column V3, then on V4, without presupposing their names. I could simply make nested for loops, but again I suspect that data.table has something good (great?) to offer there.

Upvotes: 4

Views: 3210

Answers (3)

Charlie
Charlie

Reputation: 522

The answer by Mike H. didn't work for me -- I think likely due to data.table updates. A modified form is:

lagcols <- c('test1','test2') #input column names

lagcols2<- paste0("lag",  #output column names
  rep(lag, times = length(lagcols)),'_',
  rep(lagcols, each = length(lag)))

lag <- c(1,2,10) #desired lags

dt[, (lagcols2) :=  shift(.SD, lag), by=id]

Upvotes: 1

Mike H.
Mike H.

Reputation: 14370

It's also possible to do it all in 1 data.table call. This could be particularly useful if you need to a lot of columns:

cols <- c("V3","V4")
dt[, (paste0("lag_",rep(cols, each = 10), "_", rep(1:10, times = length(cols)))) := 
            unlist(lapply(.SD, function(x) shift(x, 1:10, type = "lag")), recursive = F), .SDcols = cols]

The paste0(...) code sets up the column names the way we want and then the unlist(lapply(...)) code gets the lags for each column in the order we want. To see how each works you can run them individually (if running the unlist(lapply(...)) you'll have to substitute in dt[,c("V3","V4")] for .SD.

dt[,1:9]
#    V1 V2      V3 V4 lag_V3_1 lag_V4_1 lag_V3_2 lag_V4_2 lag_V3_3
# 1:  1  A  0.3408  1       NA       NA       NA       NA       NA
# 2:  2  B -0.7033  2   0.3408        1       NA       NA       NA
# 3:  1  C -0.3795  3  -0.7033        2   0.3408        1       NA
# 4:  2  A -0.7460  4  -0.3795        3  -0.7033        2   0.3408
# 5:  1  B  0.3408  5  -0.7460        4  -0.3795        3  -0.7033
# 6:  2  C -0.7033  6   0.3408        5  -0.7460        4  -0.3795
# 7:  1  A -0.3795  7  -0.7033        6   0.3408        5  -0.7460
# 8:  2  B -0.7460  8  -0.3795        7  -0.7033        6   0.3408
# 9:  1  C  0.3408  9  -0.7460        8  -0.3795        7  -0.7033
#10:  2  A -0.7033 10   0.3408        9  -0.7460        8  -0.3795
#11:  1  B -0.3795 11  -0.7033       10   0.3408        9  -0.7460
#12:  2  C -0.7460 12  -0.3795       11  -0.7033       10   0.3408

Upvotes: 4

mt1022
mt1022

Reputation: 17299

Yes, shift considers this type of user-demand. When parameter n for shift is a vector, shift will return a list for each shift in n:

dt[, sprintf("V3_lag_%06d", 1:10) := shift(V3, 1:10, type = 'lag')]
dt[, sprintf("V4_lag_%06d", 1:10) := shift(V4, 1:10, type = 'lag')]

#     V1 V2      V3 V4 V3_lag_000001 V3_lag_000002 V3_lag_000003 V3_lag_000004 V3_lag_000005
#  1:  1  A  1.2322  1            NA            NA            NA            NA            NA
# 2:  2  B  1.6094  2        1.2322            NA            NA            NA            NA
# 3:  1  C  0.4016  3        1.6094        1.2322            NA            NA            NA
# 4:  2  A -0.2730  4        0.4016        1.6094        1.2322            NA            NA
# 5:  1  B  1.2322  5       -0.2730        0.4016        1.6094        1.2322            NA
# 6:  2  C  1.6094  6        1.2322       -0.2730        0.4016        1.6094        1.2322
# 7:  1  A  0.4016  7        1.6094        1.2322       -0.2730        0.4016        1.6094
# 8:  2  B -0.2730  8        0.4016        1.6094        1.2322       -0.2730        0.4016
# 9:  1  C  1.2322  9       -0.2730        0.4016        1.6094        1.2322       -0.2730
# 10:  2  A  1.6094 10        1.2322       -0.2730        0.4016        1.6094        1.2322
# 11:  1  B  0.4016 11        1.6094        1.2322       -0.2730        0.4016        1.6094
# 12:  2  C -0.2730 12        0.4016        1.6094        1.2322       -0.2730        0.4016
# V3_lag_000006 V3_lag_000007 V3_lag_000008 V3_lag_000009 V3_lag_000010 V4_lag_000001
# 1:            NA            NA            NA            NA            NA            NA
# 2:            NA            NA            NA            NA            NA             1
# 3:            NA            NA            NA            NA            NA             2
# 4:            NA            NA            NA            NA            NA             3
# 5:            NA            NA            NA            NA            NA             4
# 6:            NA            NA            NA            NA            NA             5
# 7:        1.2322            NA            NA            NA            NA             6
# 8:        1.6094        1.2322            NA            NA            NA             7
# 9:        0.4016        1.6094        1.2322            NA            NA             8
# 10:       -0.2730        0.4016        1.6094        1.2322            NA             9
# 11:        1.2322       -0.2730        0.4016        1.6094        1.2322            10
# 12:        1.6094        1.2322       -0.2730        0.4016        1.6094            11
# V4_lag_000002 V4_lag_000003 V4_lag_000004 V4_lag_000005 V4_lag_000006 V4_lag_000007
# 1:            NA            NA            NA            NA            NA            NA
# 2:            NA            NA            NA            NA            NA            NA
# 3:             1            NA            NA            NA            NA            NA
# 4:             2             1            NA            NA            NA            NA
# 5:             3             2             1            NA            NA            NA
# 6:             4             3             2             1            NA            NA
# 7:             5             4             3             2             1            NA
# 8:             6             5             4             3             2             1
# 9:             7             6             5             4             3             2
# 10:             8             7             6             5             4             3
# 11:             9             8             7             6             5             4
# 12:            10             9             8             7             6             5
# V4_lag_000008 V4_lag_000009 V4_lag_000010
# 1:            NA            NA            NA
# 2:            NA            NA            NA
# 3:            NA            NA            NA
# 4:            NA            NA            NA
# 5:            NA            NA            NA
# 6:            NA            NA            NA
# 7:            NA            NA            NA
# 8:            NA            NA            NA
# 9:             1            NA            NA
# 10:             2             1            NA
# 11:             3             2             1
# 12:             4             3             2

Upvotes: 7

Related Questions