PavoDive
PavoDive

Reputation: 6496

Define value by previous and actual row

I have a data.table with two fields, startvalue and endValue, that I need to populate based in some information from the previous and actual rows. While this is somehow similar to this, and this, I haven't been able to get the results I want.

Dummy Data:

a <- data.table(user = c("A", "A", "A", "B", "B"), 
                gap = c(1, 0, 2, 2, 3), 
                priority = c(1, 3, 2, 2, 1))

Then I fix the startValue for all priorities == 1:

setkey(a, user, priority)
a[priority == 1, startValue := 0]

and I set the endValue for those which startValue is already defined:

a[!is.na(startValue), endValue := startValue + gap*3]

Now comes the problem. I want the startValue in row 2 (user A, priority 2) to be the same as endValue of row 1, so I can calculate the new endValue. I know I can use a loop, but I wanted to know if it's possible to do it by using any other function or combination of functions.

I tried several combinations of shift and zoo:na.locf but always ended up messing the already existing values.

Expected result:

b <- structure(list(user = c("A", "A", "A", "B", "B"), 
                    gap = c(1, 2, 0, 3, 2), 
                    priority = c(1, 2, 3, 1, 2), 
                    startValue = c(0, 3, 9, 0, 9), 
                    endValue = c(3, 9, 9, 9, 15)), 
               row.names = c(NA, -5L), 
               class = c("data.table", "data.frame"))

Upvotes: 1

Views: 61

Answers (2)

Henrik
Henrik

Reputation: 67778

First, calculate end values using cumsum. Then use shift to get start values.

a[ , c("startValue", "endValue") := {
  e1 <- startValue[1] + gap[1] * 3
  endValue <- c(e1, e1 + cumsum(gap[-1] * 3))
  startValue <- shift(endValue, fill = startValue[1])
  .(startValue, endValue)
}, by = user]

#    user gap priority startValue endValue
# 1:    A   1        1          0        3
# 2:    A   2        2          3        9
# 3:    A   0        3          9        9
# 4:    B   3        1          0        9
# 5:    B   2        2          9       15

Upvotes: 1

akrun
akrun

Reputation: 887153

We could use accumulate from purrr

library(purrr)
library(data.table)
a[, endValue := accumulate(gap,  ~   .x + .y * 3, .init = 0)[-1], user
   ][, startValue := shift(endValue, fill = 0), user][]
all.equal(a, b, check.attributes = FALSE)
#[1] TRUE

Or use Reduce from base R to create the 'endValue' column and then take the lag of the 'endValue' for creating the 'startValue' grouped by 'user'

a[, endValue := Reduce(function(x, y) x + y *3, gap, 
     accumulate = TRUE, init = 0)[-1], user]

Upvotes: 2

Related Questions