user3740289
user3740289

Reputation: 265

Recursive update of columns in R data.table

As a simple example I need to create the following table (this is my desired result):

library(data.table)
DT <- data.table( A= c(2,5,4,-2,4), 
                  B= c(1000,1100,1375,1650,1485), 
                  C= c(50,55,68.75,82.5,74.25), 
                  D= c(100,275,275,-165,297))
DT

This is my attempt so far which is not working:

DT.2 <- data.table(A= c(2,5,4,-2,4) )
DT.2[, B := 1000 ]  # B should begin at 1000, then cumulatively add column D
DT.2[, C := B * 0.05 ]  
DT.2[, D := A * C ]  
DT.2[, B := 1000 + cumsum(shift(D, type= "lag", fill=FALSE)) ]
DT.2

As you can see the columns do not update correctly as each column relies on the results of other columns. It's a very easy calculation in Excel and I just need to understand how to adapt it to R. Thank you

Upvotes: 0

Views: 1678

Answers (2)

geneorama
geneorama

Reputation: 3720

I think that you actually want a function, and I am guessing that it would make more sense like this:

fun <- function(id, ## not sure what this is, probably an id
                start = 1000, 
                rate = .05,
                gain_or_loss){
    require(data.table)

    cnt <- length(id)
    ret <- data.table(ID = c(id, NA),
                      bal = cumsum(c(start, gain_or_loss)),
                      fixed_change = rep(NA_real_, cnt + 1),
                      total_change = c(gain_or_loss, NA_real_))
    ret[ , fixed_change := bal * 0.05 ]  
    ret <- ret[] ## needed because of a data.table quirk
    return(ret)
}

Which would give:

fun(id = c(2,5,4,-2,4),
    gain_or_loss = c(100,275,275,-165,297))

#    ID  bal fixed_change total_change
# 1:  2 1000        50.00          100
# 2:  5 1100        55.00          275
# 3:  4 1375        68.75          275
# 4: -2 1650        82.50         -165
# 5:  4 1485        74.25          297
# 6: NA 1782        89.10           NA

I would also add code after require to say if (length(id) != length(gain_or_loss)) stop("id and gain_or_loss need to be of the same length")

Upvotes: 1

MichaelChirico
MichaelChirico

Reputation: 34703

So far as I can tell, you need a loop. Here's my input data:

DT <- data.table(A = c(2, 5, 4, -2, 4),
                 B = c(1000, rep(NA, 4)),
                 C = numeric(5),
                 D = numeric(5))

And the loop I use:

#initial row
DT[1, c("C", "D") := .(.05 * B, .05 * A * B)]

#subsequent rows
for (nn in 2:nrow(DT)){
  new_B <- DT[nn - 1L, B + D]
  DT[nn, c("B", "C", "D") := .(new_B, .05 * new_B, .05 * A * new_B)]
}

Could also easily be translated to a sapply so you don't have nn sitting around in your namespace (but now it will print something meaningless to your console):

sapply(2:nrow(DT), function(nn){
  (same as before)})

And yes, I can imagine this would look much simpler in Excel since it auto-updates the formula for you with click+drag.

Upvotes: 3

Related Questions