Reputation: 2022
Background:
Lets say I have this code
library(data.table)
#reproducibility
set.seed(45L)
#make table
dt <- data.table(V1=c(1L,2L),
V2=LETTERS[1:3],
V3=round(rnorm(4),4),
V4 = 1:12)
dt
for which I get
> dt
V1 V2 V3 V4
1: 1 A 0.3408 1
2: 2 B -0.7033 2
3: 1 C -0.3795 3
4: 2 A -0.7460 4
5: 1 B 0.3408 5
6: 2 C -0.7033 6
7: 1 A -0.3795 7
8: 2 B -0.7460 8
9: 1 C 0.3408 9
10: 2 A -0.7033 10
11: 1 B -0.3795 11
12: 2 C -0.7460 12
and I want to append create 10 columns of the lags of V3.
Question:
Is there a way to do it within the "data-table" paradigm.
More details:
If it were a data.frame then I could do a loop.
dt <- as.data.frame(dt)
for(i in 1:10){
dt <- cbind(dt, shift(x = dt[, 3],
n = i,
fill = NA,
type = "lag"))
names(dt)[ncol(dt)] <- sprintf("lag_%06d",i)
}
dt
for which I get:
> dt
V1 V2 V3 V4 lag_000001 lag_000002 lag_000003 lag_000004 lag_000005 lag_000006 lag_000007 lag_000008 lag_000009 lag_000010
1 1 A 0.3408 1 NA NA NA NA NA NA NA NA NA NA
2 2 B -0.7033 2 0.3408 NA NA NA NA NA NA NA NA NA
3 1 C -0.3795 3 -0.7033 0.3408 NA NA NA NA NA NA NA NA
4 2 A -0.7460 4 -0.3795 -0.7033 0.3408 NA NA NA NA NA NA NA
5 1 B 0.3408 5 -0.7460 -0.3795 -0.7033 0.3408 NA NA NA NA NA NA
6 2 C -0.7033 6 0.3408 -0.7460 -0.3795 -0.7033 0.3408 NA NA NA NA NA
7 1 A -0.3795 7 -0.7033 0.3408 -0.7460 -0.3795 -0.7033 0.3408 NA NA NA NA
8 2 B -0.7460 8 -0.3795 -0.7033 0.3408 -0.7460 -0.3795 -0.7033 0.3408 NA NA NA
9 1 C 0.3408 9 -0.7460 -0.3795 -0.7033 0.3408 -0.7460 -0.3795 -0.7033 0.3408 NA NA
10 2 A -0.7033 10 0.3408 -0.7460 -0.3795 -0.7033 0.3408 -0.7460 -0.3795 -0.7033 0.3408 NA
11 1 B -0.3795 11 -0.7033 0.3408 -0.7460 -0.3795 -0.7033 0.3408 -0.7460 -0.3795 -0.7033 0.3408
12 2 C -0.7460 12 -0.3795 -0.7033 0.3408 -0.7460 -0.3795 -0.7033 0.3408 -0.7460 -0.3795 -0.7033
There has to be a more elegant way. Something that can handle much larger data much more quickly and efficiently.
Now what if I wanted to first do this, to make 10 lags, on column V3, then on V4, without presupposing their names. I could simply make nested for loops, but again I suspect that data.table has something good (great?) to offer there.
Upvotes: 4
Views: 3210
Reputation: 522
The answer by Mike H. didn't work for me -- I think likely due to data.table updates. A modified form is:
lagcols <- c('test1','test2') #input column names
lagcols2<- paste0("lag", #output column names
rep(lag, times = length(lagcols)),'_',
rep(lagcols, each = length(lag)))
lag <- c(1,2,10) #desired lags
dt[, (lagcols2) := shift(.SD, lag), by=id]
Upvotes: 1
Reputation: 14370
It's also possible to do it all in 1 data.table
call. This could be particularly useful if you need to a lot of columns:
cols <- c("V3","V4")
dt[, (paste0("lag_",rep(cols, each = 10), "_", rep(1:10, times = length(cols)))) :=
unlist(lapply(.SD, function(x) shift(x, 1:10, type = "lag")), recursive = F), .SDcols = cols]
The paste0(...)
code sets up the column names the way we want and then the unlist(lapply(...))
code gets the lags for each column in the order we want. To see how each works you can run them individually (if running the unlist(lapply(...))
you'll have to substitute in dt[,c("V3","V4")]
for .SD
.
dt[,1:9]
# V1 V2 V3 V4 lag_V3_1 lag_V4_1 lag_V3_2 lag_V4_2 lag_V3_3
# 1: 1 A 0.3408 1 NA NA NA NA NA
# 2: 2 B -0.7033 2 0.3408 1 NA NA NA
# 3: 1 C -0.3795 3 -0.7033 2 0.3408 1 NA
# 4: 2 A -0.7460 4 -0.3795 3 -0.7033 2 0.3408
# 5: 1 B 0.3408 5 -0.7460 4 -0.3795 3 -0.7033
# 6: 2 C -0.7033 6 0.3408 5 -0.7460 4 -0.3795
# 7: 1 A -0.3795 7 -0.7033 6 0.3408 5 -0.7460
# 8: 2 B -0.7460 8 -0.3795 7 -0.7033 6 0.3408
# 9: 1 C 0.3408 9 -0.7460 8 -0.3795 7 -0.7033
#10: 2 A -0.7033 10 0.3408 9 -0.7460 8 -0.3795
#11: 1 B -0.3795 11 -0.7033 10 0.3408 9 -0.7460
#12: 2 C -0.7460 12 -0.3795 11 -0.7033 10 0.3408
Upvotes: 4
Reputation: 17299
Yes, shift
considers this type of user-demand. When parameter n
for shift
is a vector, shift
will return a list for each shift in n
:
dt[, sprintf("V3_lag_%06d", 1:10) := shift(V3, 1:10, type = 'lag')]
dt[, sprintf("V4_lag_%06d", 1:10) := shift(V4, 1:10, type = 'lag')]
# V1 V2 V3 V4 V3_lag_000001 V3_lag_000002 V3_lag_000003 V3_lag_000004 V3_lag_000005
# 1: 1 A 1.2322 1 NA NA NA NA NA
# 2: 2 B 1.6094 2 1.2322 NA NA NA NA
# 3: 1 C 0.4016 3 1.6094 1.2322 NA NA NA
# 4: 2 A -0.2730 4 0.4016 1.6094 1.2322 NA NA
# 5: 1 B 1.2322 5 -0.2730 0.4016 1.6094 1.2322 NA
# 6: 2 C 1.6094 6 1.2322 -0.2730 0.4016 1.6094 1.2322
# 7: 1 A 0.4016 7 1.6094 1.2322 -0.2730 0.4016 1.6094
# 8: 2 B -0.2730 8 0.4016 1.6094 1.2322 -0.2730 0.4016
# 9: 1 C 1.2322 9 -0.2730 0.4016 1.6094 1.2322 -0.2730
# 10: 2 A 1.6094 10 1.2322 -0.2730 0.4016 1.6094 1.2322
# 11: 1 B 0.4016 11 1.6094 1.2322 -0.2730 0.4016 1.6094
# 12: 2 C -0.2730 12 0.4016 1.6094 1.2322 -0.2730 0.4016
# V3_lag_000006 V3_lag_000007 V3_lag_000008 V3_lag_000009 V3_lag_000010 V4_lag_000001
# 1: NA NA NA NA NA NA
# 2: NA NA NA NA NA 1
# 3: NA NA NA NA NA 2
# 4: NA NA NA NA NA 3
# 5: NA NA NA NA NA 4
# 6: NA NA NA NA NA 5
# 7: 1.2322 NA NA NA NA 6
# 8: 1.6094 1.2322 NA NA NA 7
# 9: 0.4016 1.6094 1.2322 NA NA 8
# 10: -0.2730 0.4016 1.6094 1.2322 NA 9
# 11: 1.2322 -0.2730 0.4016 1.6094 1.2322 10
# 12: 1.6094 1.2322 -0.2730 0.4016 1.6094 11
# V4_lag_000002 V4_lag_000003 V4_lag_000004 V4_lag_000005 V4_lag_000006 V4_lag_000007
# 1: NA NA NA NA NA NA
# 2: NA NA NA NA NA NA
# 3: 1 NA NA NA NA NA
# 4: 2 1 NA NA NA NA
# 5: 3 2 1 NA NA NA
# 6: 4 3 2 1 NA NA
# 7: 5 4 3 2 1 NA
# 8: 6 5 4 3 2 1
# 9: 7 6 5 4 3 2
# 10: 8 7 6 5 4 3
# 11: 9 8 7 6 5 4
# 12: 10 9 8 7 6 5
# V4_lag_000008 V4_lag_000009 V4_lag_000010
# 1: NA NA NA
# 2: NA NA NA
# 3: NA NA NA
# 4: NA NA NA
# 5: NA NA NA
# 6: NA NA NA
# 7: NA NA NA
# 8: NA NA NA
# 9: 1 NA NA
# 10: 2 1 NA
# 11: 3 2 1
# 12: 4 3 2
Upvotes: 7