Reputation: 6496
I have a data.table
with two fields, startvalue
and endValue
, that I need to populate based in some information from the previous and actual rows. While this is somehow similar to this, and this, I haven't been able to get the results I want.
Dummy Data:
a <- data.table(user = c("A", "A", "A", "B", "B"),
gap = c(1, 0, 2, 2, 3),
priority = c(1, 3, 2, 2, 1))
Then I fix the startValue
for all priorities == 1:
setkey(a, user, priority)
a[priority == 1, startValue := 0]
and I set the endValue
for those which startValue
is already defined:
a[!is.na(startValue), endValue := startValue + gap*3]
Now comes the problem. I want the startValue
in row 2 (user A, priority 2) to be the same as endValue
of row 1, so I can calculate the new endValue
. I know I can use a loop, but I wanted to know if it's possible to do it by using any other function or combination of functions.
I tried several combinations of shift
and zoo:na.locf
but always ended up messing the already existing values.
Expected result:
b <- structure(list(user = c("A", "A", "A", "B", "B"),
gap = c(1, 2, 0, 3, 2),
priority = c(1, 2, 3, 1, 2),
startValue = c(0, 3, 9, 0, 9),
endValue = c(3, 9, 9, 9, 15)),
row.names = c(NA, -5L),
class = c("data.table", "data.frame"))
Upvotes: 1
Views: 61
Reputation: 67778
First, calculate end values using cumsum
. Then use shift
to get start values.
a[ , c("startValue", "endValue") := {
e1 <- startValue[1] + gap[1] * 3
endValue <- c(e1, e1 + cumsum(gap[-1] * 3))
startValue <- shift(endValue, fill = startValue[1])
.(startValue, endValue)
}, by = user]
# user gap priority startValue endValue
# 1: A 1 1 0 3
# 2: A 2 2 3 9
# 3: A 0 3 9 9
# 4: B 3 1 0 9
# 5: B 2 2 9 15
Upvotes: 1
Reputation: 887153
We could use accumulate
from purrr
library(purrr)
library(data.table)
a[, endValue := accumulate(gap, ~ .x + .y * 3, .init = 0)[-1], user
][, startValue := shift(endValue, fill = 0), user][]
all.equal(a, b, check.attributes = FALSE)
#[1] TRUE
Or use Reduce
from base R
to create the 'endValue' column and then take the lag
of the 'endValue' for creating the 'startValue' grouped by 'user'
a[, endValue := Reduce(function(x, y) x + y *3, gap,
accumulate = TRUE, init = 0)[-1], user]
Upvotes: 2