What optimisation approaches are there for the current for-loop in R?

Question

I have the following code that I would like to optimise, but I am currently not sure how I can do this. First of all, let me give you an introduction to the problem.

test.data contains approximately 200 000 rows, this makes the implementation below extremely slow in R. The first thing, I tried to do is to optimise the functions and remove as much as testing as possible (if statements), however I am unable to do this in two instances in the code below.

library(data.table)
test.data <- data.table(person = c("A", "B", "C"),
                        duration = c(120,50,30),
                        time = c(159, 231, 312),
                        savings = c(140000, 200000, 300000),
                        ren = c(0.0037, 0.0011, 0.0015),
                        res = c(55, 10, 30))

set.seed(35)

# Deduction series, note that in this example, they are arbitrary.
# They do not follow a pattern. I believe, this is the core of the problem.
# Which makes it extremely difficult to vectorise, since this would result in
# no closed solution.
c_a <- round(runif(max(test.data$duration)), 2) / 10

# Put in as a constant, but it can vary arbitrary.
c_b <- rep(round((8.5 / 12)/100, digits = 4), max(test.data$duration))
rnew <- 0.25
result <- matrix(0, nrow = 6, ncol = 120)

for(j in 1:nrow(test.data)){
  savings <- test.data$savings[j]
  duration <- test.data$duration[j]
  time <- test.data$time[j]
  res <- test.data$res[j]
  m <- matrix(nrow = 6, ncol = duration)

for(i in 1:duration){
  m[1,i] <- ifelse(i == 1, savings, m[6, i-1])

  m[2,i] <- -m[1,i] * c_a[i]

  m[3,i] <- -(m[1,i] + m[2,i]) * c_b[i]

  m[4,i] <- ifelse(i == duration, -(m[1,i] + m[2,i] + m[3,i]), -(m[1,i] + m[2,i]) / (time + 1 - i))

  if(i == res & res < time){
    m[5, i] <- -(m[1,i] + m[2,i]) * (1 - rnew)
  } else {
    m[5, i] <- 0
  }

  m[6, i] <- m[1,i] + m[2,i] + m[3,i] + m[4,i] + m[5,i]
}

  m <- cbind(m, matrix(0, ncol = ncol(result) - ncol(m), nrow = nrow(result)))

  result <- matrix(mapply(sum, result, m, MoreArgs=list(na.rm=T)),ncol=ncol(result))
}

Second, I have tried to vectorise the code, but I believe this is not doable, since c_a and c_b are random values and therefore, I can't simply raise things to a certain power. I believe in order to be able to vectorise the code, I need to be able to write a closed form function, but I am unable to do this.

Third problem I came across is the memory size, if I store all intermediate results in memory, that would explode everything into 3 * 120 * 6, which is quite a significant memory increase in my opinion, so I am literally forced to do this "one at the time".

In addition, I have tried %dopar%, but unfortunately, memory constraints do not allow me to use more than 2 cores (16GB of memory).

Now I am wondering, what optimisation techniques further exists without going as deep as RCpp.

chinsoon12 · Accepted Answer

A possible approach to calculate the sum of outstanding amount (i.e. OPs row 1 in result). All the intermediate values (m[2,j], m[3,j], m[4,j], m[5,j]) can be calculated easily if required. Caveat: I did not time it with actual dim

library(data.table)

calcAmor <- function(ca, cb, rnew, dur, S0, tau, res) {
    amortize <- function(S, ca.t) S - ca.t[1L]*S - (1-ca.t[1L])*cb*S - (S - ca.t[1L]*S) / (tau + 1 - ca.t[2L])

    ans <- Reduce(amortize,
        split(cbind(ca, seq_along(ca)), seq_along(ca)),
        init=S0,
        accumulate=TRUE)[-(dur+1L)]

    ix <- min(res+1L, dur):dur
    tmp <- Reduce(amortize,
        split(cbind(ca[ix], ix), seq_along(ix)),
        init=amortize(ans[res], c(ca[res], res)) - (ans[res] - ans[res]*ca[res])*(1-rnew),
        accumulate=TRUE)
    ans[ix] <- tmp[-length(tmp)]    

    ans
}

set.seed(35)
test.data <- data.table(person = c("A", "B", "C"),
    duration = c(120,50,30),
    time = c(159, 231, 312),
    savings = c(140000, 200000, 300000),
    res = c(55, 10, 30))
maxd <- test.data[, max(duration)]
c_a <- round(runif(maxd), 2) / 10
rnew <- 0.25
cb <- round((8.5 / 12)/100, digits = 4)

test.data[, .(
        dur=seq_len(duration),
        S=calcAmor(ca=c_a[seq_len(duration)], cb, rnew, dur=duration, S0=savings, tau=time, res=res)),
    by=.(person)][, sum(S), by=.(dur)]

output:

     dur           V1
  1:   1 6.400000e+05
  2:   2 5.783318e+05
  3:   3 5.711966e+05
  4:   4 5.336450e+05
  5:   5 4.774502e+05
 ---                 
116: 116 7.075169e+00
117: 117 6.788631e+00
118: 118 6.339002e+00
119: 119 5.639335e+00
120: 120 5.297898e+00

What optimisation approaches are there for the current for-loop in R?

Answers (2)

Related Questions