maarvd
maarvd

Reputation: 1284

Fast way to calculate value in cell based on value in previous row (data.table)

Say I have the following dataset dt and a constant constant.

dt <- structure(list(var1 = c(-92186.7470607738, -19163.5035325072, 
                              -18178.8396858014, -9844.67882723287, -16494.7802822178, -17088.0576319257
), var2 = c(-3.12, NA, NA, NA, NA, NA)), class = c("data.table", 
                                                   "data.frame"), row.names = c(NA, -6L))

constant <- 608383

print(dt)
         var1  var2
1: -92186.747 -3.12
2: -19163.504    NA
3: -18178.840    NA
4:  -9844.679    NA
5: -16494.780    NA
6: -17088.058    NA

The value of var2 depends on the value of the previous row, like so

for(i in 2:nrow(dt)){
  prev <- dt[(i-1),]
  dt[i, var2 := prev$var2 - var1/constant]
}

print(dt)
         var1      var2
1: -92186.747 -3.120000
2: -19163.504 -3.088501
3: -18178.840 -3.058620
4:  -9844.679 -3.042439
5: -16494.780 -3.015326
6: -17088.058 -2.987238

Though this for loop provides the desired output, it is very slow on a big dataset. Is there a faster way to achieve this output? Preferably using data.table.

Upvotes: 4

Views: 524

Answers (2)

AnilGoyal
AnilGoyal

Reputation: 26218

BaseR variant of above answer proposed by dear friend (which you can use easily in your data.table syntax)

dt <- structure(list(var1 = c(-92186.7470607738, -19163.5035325072, 
                              -18178.8396858014, -9844.67882723287, -16494.7802822178, -17088.0576319257
), var2 = c(-3.12, NA, NA, NA, NA, NA)), class = c("data.table", 
                                                   "data.frame"), row.names = c(NA, -6L))

dt
#>         var1  var2
#> 1 -92186.747 -3.12
#> 2 -19163.504    NA
#> 3 -18178.840    NA
#> 4  -9844.679    NA
#> 5 -16494.780    NA
#> 6 -17088.058    NA
constant <- 608383
Reduce(function(.x, .y) .x - .y /constant , dt$var1[-1], init = dt$var2[1], accumulate = T)
#> [1] -3.120000 -3.088501 -3.058620 -3.042439 -3.015326 -2.987238

Created on 2021-05-26 by the reprex package (v2.0.0)

Upvotes: 2

Anoushiravan R
Anoushiravan R

Reputation: 21908

Here is a solution using accumulate function from purrr package in case you were interested. In this solution .y represents the current value of var1 that we would like to iterate over and .x represents the accumulated value that we calculate and put in var2 column. As you might have noticed I excluded the first value of var1 as it we don't apply our formula on it.

library(dplyr)
library(purrr)

dt %>%
  mutate(var2 = accumulate(var1[-1], .init = var2[1], ~ .x - .y /constant))


         var1      var2
1: -92186.747 -3.120000
2: -19163.504 -3.088501
3: -18178.840 -3.058620
4:  -9844.679 -3.042439
5: -16494.780 -3.015326
6: -17088.058 -2.987238

Upvotes: 4

Related Questions