Mike.Gahan
Mike.Gahan

Reputation: 4615

Shifting row values by lag value in another column

I have a rather large dataset and I am interested in "marching" values forward through time based on values from another column. For example, if I have a Value = 3 at Time = 0 and a DesiredShift = 2, I want the 3 to shift down two rows to be at Time = 2. Here is a reproducible example.

Build reproducible fake data

library(data.table)
set.seed(1)
rowsPerID <- 8
dat <- CJ(1:2, 1:rowsPerID)
setnames(dat, c("ID","Time"))
dat[, Value := rpois(.N, 4)]
dat[, Shift := sample(0:2, size=.N, replace=TRUE)]

Fake Data

#     ID Time Value Shift
#  1:  1    1     3     2
#  2:  1    2     3     2
#  3:  1    3     4     1
#  4:  1    4     7     2
#  5:  1    5     2     2
#  6:  1    6     7     0
#  7:  1    7     7     1
#  8:  1    8     5     0
#  9:  2    1     5     0
# 10:  2    2     1     1
# 11:  2    3     2     0
# 12:  2    4     2     1
# 13:  2    5     5     2
# 14:  2    6     3     1
# 15:  2    7     5     1
# 16:  2    8     4     1

I want each Value to shift forward according the the Shift column. So the DesiredOutput column for row 3 will be equal to 3 since the value at Time=1 is Value = 3 and Shift = 2.

Row 4 shows 3+4=7 since 3 shifts down 2 and 4 shifts down 1.

I would like to be able to do this by ID group and hopefully take advantage of data.table since speed is of interest for this problem.

Desired Result

#     ID Time Value Shift DesiredOutput
#  1:  1    1     3     2       NA
#  2:  1    2     3     2       NA
#  3:  1    3     4     1       3
#  4:  1    4     7     2       3+4 = 7
#  5:  1    5     2     2       NA
#  6:  1    6     7     0       7+7 = 14
#  7:  1    7     7     1       2
#  8:  1    8     5     0       7+5 = 12
#  9:  2    1     5     0       5
# 10:  2    2     1     1       NA
# 11:  2    3     2     0       1+2 = 3
# 12:  2    4     2     1       NA
# 13:  2    5     5     2       2
# 14:  2    6     3     1       NA
# 15:  2    7     5     1       3+5=8
# 16:  2    8     4     1       5

I was hoping to get this working using the data.table::shift function, but I am unsure how to make this work using multiple lag parameters.

Upvotes: 4

Views: 873

Answers (1)

sirallen
sirallen

Reputation: 1966

Try this:

dat[, TargetIndex:= .I + Shift]

toMerge = dat[, list(Out = sum(Value)), by='TargetIndex']

dat[, TargetIndex:= .I]

# dat = merge(dat, toMerge, by='TargetIndex', all=TRUE)
dat[toMerge, on='TargetIndex', DesiredOutput:= i.Out]

> dat
#     ID Time Value Shift TargetIndex DesiredOutput
#  1:  1    1     3     2           1            NA
#  2:  1    2     3     2           2            NA
#  3:  1    3     4     1           3             3
#  4:  1    4     7     2           4             7
#  5:  1    5     2     2           5            NA
#  6:  1    6     7     0           6            14
#  7:  1    7     7     1           7             2
#  8:  1    8     5     0           8            12
#  9:  2    1     5     0           9             5
# 10:  2    2     1     1          10            NA
# 11:  2    3     2     0          11             3
# 12:  2    4     2     1          12            NA
# 13:  2    5     5     2          13             2
# 14:  2    6     3     1          14            NA
# 15:  2    7     5     1          15             8
# 16:  2    8     4     1          16             5

Upvotes: 6

Related Questions