Fast way to calculate value in cell based on value in previous row (data.table)

Question

Say I have the following dataset dt and a constant constant.

dt <- structure(list(var1 = c(-92186.7470607738, -19163.5035325072, 
                              -18178.8396858014, -9844.67882723287, -16494.7802822178, -17088.0576319257
), var2 = c(-3.12, NA, NA, NA, NA, NA)), class = c("data.table", 
                                                   "data.frame"), row.names = c(NA, -6L))

constant <- 608383

print(dt)
         var1  var2
1: -92186.747 -3.12
2: -19163.504    NA
3: -18178.840    NA
4:  -9844.679    NA
5: -16494.780    NA
6: -17088.058    NA

The value of var2 depends on the value of the previous row, like so

for(i in 2:nrow(dt)){
  prev <- dt[(i-1),]
  dt[i, var2 := prev$var2 - var1/constant]
}

print(dt)
         var1      var2
1: -92186.747 -3.120000
2: -19163.504 -3.088501
3: -18178.840 -3.058620
4:  -9844.679 -3.042439
5: -16494.780 -3.015326
6: -17088.058 -2.987238

Though this for loop provides the desired output, it is very slow on a big dataset. Is there a faster way to achieve this output? Preferably using data.table.

Anoushiravan R · Accepted Answer

Here is a solution using accumulate function from purrr package in case you were interested. In this solution .y represents the current value of var1 that we would like to iterate over and .x represents the accumulated value that we calculate and put in var2 column. As you might have noticed I excluded the first value of var1 as it we don't apply our formula on it.

library(dplyr)
library(purrr)

dt %>%
  mutate(var2 = accumulate(var1[-1], .init = var2[1], ~ .x - .y /constant))


         var1      var2
1: -92186.747 -3.120000
2: -19163.504 -3.088501
3: -18178.840 -3.058620
4:  -9844.679 -3.042439
5: -16494.780 -3.015326
6: -17088.058 -2.987238

Fast way to calculate value in cell based on value in previous row (data.table)

Answers (2)

Related Questions