Reputation: 6193
This question is similar to Creating a bunch of lagged variables in data.table at once and How to create a lag variable within each group?, but not quite the same as far as I can tell.
I would like to create several lead variables, e.g. lead1
, lead2
, and lead3
below, grouped by groups
.
Example data
require(data.table)
set.seed(1)
data <- data.table(time =c(1:10,1:8),groups = c(rep(c("a","b"),c(10,8))),
value = rnorm(18))
data
time groups value
1: 1 a -0.62645381
2: 2 a 0.18364332
3: 3 a -0.83562861
4: 4 a 1.59528080
5: 5 a 0.32950777
6: 6 a -0.82046838
7: 7 a 0.48742905
8: 8 a 0.73832471
9: 9 a 0.57578135
10: 10 a -0.30538839
11: 1 b 1.51178117
12: 2 b 0.38984324
13: 3 b -0.62124058
14: 4 b -2.21469989
15: 5 b 1.12493092
16: 6 b -0.04493361
17: 7 b -0.01619026
18: 8 b 0.94383621
The resulting data table should be
time groups value lead1 lead2 lead3
1 1 a -0.62645381 0.18364332 -0.83562861 1.59528080
2 2 a 0.18364332 -0.83562861 1.59528080 0.32950777
3 3 a -0.83562861 1.59528080 0.32950777 -0.82046838
4 4 a 1.59528080 0.32950777 -0.82046838 0.48742905
5 5 a 0.32950777 -0.82046838 0.48742905 0.73832471
6 6 a -0.82046838 0.48742905 0.73832471 0.57578135
7 7 a 0.48742905 0.73832471 0.57578135 -0.30538839
8 8 a 0.73832471 0.57578135 -0.30538839 NA
9 9 a 0.57578135 -0.30538839 NA NA
10 10 a -0.30538839 NA NA NA
11 1 b 1.51178117 0.38984324 -0.62124058 -2.21469989
12 2 b 0.38984324 -0.62124058 -2.21469989 1.12493092
13 3 b -0.62124058 -2.21469989 1.12493092 -0.04493361
14 4 b -2.21469989 1.12493092 -0.04493361 -0.01619026
15 5 b 1.12493092 -0.04493361 -0.01619026 0.94383621
16 6 b -0.04493361 -0.01619026 0.94383621 NA
17 7 b -0.01619026 0.94383621 NA NA
18 8 b 0.94383621 NA NA NA
Note that my actual data set is much larger and I may need more than 3 lead variables.
I am using data.table
version 1.9.4 and am not sure when I will be able to update to the latest version, so a solution in this version would be a bonus. Sorry for this additional constraint.
Thanks in advance.
Upvotes: 1
Views: 1402
Reputation: 93791
The function below will create lead columns for all lead values specified in the leads
argument, which is just a vector of positive integers.
library(data.table)
lead.n = function(leads, values) {
as.data.frame(sapply(leads, function(n) {
lead.vals = c(values[-c(1:n)], rep(NA,n))
}
))
}
data[, paste0("lead",1:3):=lead.n(1:3,value), by=groups]
time groups value lead1 lead2 lead3
1: 1 a -0.62645381 0.18364332 -0.83562861 1.59528080
2: 2 a 0.18364332 -0.83562861 1.59528080 0.32950777
3: 3 a -0.83562861 1.59528080 0.32950777 -0.82046838
4: 4 a 1.59528080 0.32950777 -0.82046838 0.48742905
5: 5 a 0.32950777 -0.82046838 0.48742905 0.73832471
6: 6 a -0.82046838 0.48742905 0.73832471 0.57578135
7: 7 a 0.48742905 0.73832471 0.57578135 -0.30538839
8: 8 a 0.73832471 0.57578135 -0.30538839 NA
9: 9 a 0.57578135 -0.30538839 NA NA
10: 10 a -0.30538839 NA NA NA
11: 1 b 1.51178117 0.38984324 -0.62124058 -2.21469989
12: 2 b 0.38984324 -0.62124058 -2.21469989 1.12493092
13: 3 b -0.62124058 -2.21469989 1.12493092 -0.04493361
14: 4 b -2.21469989 1.12493092 -0.04493361 -0.01619026
15: 5 b 1.12493092 -0.04493361 -0.01619026 0.94383621
16: 6 b -0.04493361 -0.01619026 0.94383621 NA
17: 7 b -0.01619026 0.94383621 NA NA
18: 8 b 0.94383621 NA NA NA
Upvotes: 4
Reputation: 92282
The standard data.table
way would be using the built in shift
function (as was already mentioned in the linked thread). you will need the latest stable version on CRAN for this - v 1.9.6+
library(data.table) # V1.9.6+
data[, paste0("lead", 1L:3L) := shift(value, 1L:3L, type = "lead"), by = groups]
data
# time groups value lead1 lead2 lead3
# 1: 1 a -0.62645381 0.18364332 -0.83562861 1.59528080
# 2: 2 a 0.18364332 -0.83562861 1.59528080 0.32950777
# 3: 3 a -0.83562861 1.59528080 0.32950777 -0.82046838
# 4: 4 a 1.59528080 0.32950777 -0.82046838 0.48742905
# 5: 5 a 0.32950777 -0.82046838 0.48742905 0.73832471
# 6: 6 a -0.82046838 0.48742905 0.73832471 0.57578135
# 7: 7 a 0.48742905 0.73832471 0.57578135 -0.30538839
# 8: 8 a 0.73832471 0.57578135 -0.30538839 NA
# 9: 9 a 0.57578135 -0.30538839 NA NA
# 10: 10 a -0.30538839 NA NA NA
# 11: 1 b 1.51178117 0.38984324 -0.62124058 -2.21469989
# 12: 2 b 0.38984324 -0.62124058 -2.21469989 1.12493092
# 13: 3 b -0.62124058 -2.21469989 1.12493092 -0.04493361
# 14: 4 b -2.21469989 1.12493092 -0.04493361 -0.01619026
# 15: 5 b 1.12493092 -0.04493361 -0.01619026 0.94383621
# 16: 6 b -0.04493361 -0.01619026 0.94383621 NA
# 17: 7 b -0.01619026 0.94383621 NA NA
# 18: 8 b 0.94383621 NA NA NA
Upvotes: 8
Reputation: 37879
You can use function lead
from dplyr
and do it in one call with data.table like this:
library(data.table)
library(dplyr)
data[, c('lead1','lead2','lead3') := list(lead(value, 1), lead(value, 2), lead(value, 3))
, by=groups]
Output
> data
time groups value lead1 lead2 lead3
1: 1 a -0.62645381 0.18364332 -0.83562861 1.59528080
2: 2 a 0.18364332 -0.83562861 1.59528080 0.32950777
3: 3 a -0.83562861 1.59528080 0.32950777 -0.82046838
4: 4 a 1.59528080 0.32950777 -0.82046838 0.48742905
5: 5 a 0.32950777 -0.82046838 0.48742905 0.73832471
6: 6 a -0.82046838 0.48742905 0.73832471 0.57578135
7: 7 a 0.48742905 0.73832471 0.57578135 -0.30538839
8: 8 a 0.73832471 0.57578135 -0.30538839 NA
9: 9 a 0.57578135 -0.30538839 NA NA
10: 10 a -0.30538839 NA NA NA
11: 1 b 1.51178117 0.38984324 -0.62124058 -2.21469989
12: 2 b 0.38984324 -0.62124058 -2.21469989 1.12493092
13: 3 b -0.62124058 -2.21469989 1.12493092 -0.04493361
14: 4 b -2.21469989 1.12493092 -0.04493361 -0.01619026
15: 5 b 1.12493092 -0.04493361 -0.01619026 0.94383621
16: 6 b -0.04493361 -0.01619026 0.94383621 NA
17: 7 b -0.01619026 0.94383621 NA NA
18: 8 b 0.94383621 NA NA NA
Upvotes: 2