Updating a data.table column sequentially with loop

Question

I've implemented a simple dynamic programming example described here, using data.table, in the hope that it would be as fast as vectorized code.

library(data.table)
B=100; M=50; alpha=0.5; beta=0.9;
n = B + M + 1
m = M + 1
u <- function(c)c^alpha
dt <- data.table(s = 0:(B+M))[, .(a = 0:min(s, M)), s] # State Space and corresponging Action Space
dt[, u := (s-a)^alpha,]                                # rewards r(s, a)
dt <- dt[, .(s_next = a:(a+B), u = u), .(s, a)]        # all possible (s') for each (s, a)
dt[, p := 1/(B+1), s]                                  # transition probs

#          s  a s_next  u        p
#     1:   0  0      0  0 0.009901
#     2:   0  0      1  0 0.009901
#     3:   0  0      2  0 0.009901
#     4:   0  0      3  0 0.009901
#     5:   0  0      4  0 0.009901
#    ---                          
#649022: 150 50    146 10 0.009901
#649023: 150 50    147 10 0.009901
#649024: 150 50    148 10 0.009901
#649025: 150 50    149 10 0.009901
#649026: 150 50    150 10 0.009901

To give a little content to my question: conditional on s and a, future values of s (s_next) is realized as one of a:(a+10), each with probability p=1/(B + 1). u column gives the u(s, a) for each combination (s, a).

Given the initial values V(always n by 1 vector) for each unique state s, V updates according to V[s]=max(u(s, a)) + beta* sum(p*v(s_next)) (Bellman Equation).
Maximization is wrt a, hence, [, `:=`(v = max(v), i = s_next[which.max(v)]), by = .(s)] in the iteration below.

Actually there is very efficient vectorized solution. I thought data.table solution would be comparable in performance as vectorized approach.

I know that the main culprit is dt[, v := V[s_next + 1]]. Alas, I have no idea how to fix it.

# Iteration starts here
system.time({
  V <- rep(0, n)  # initial guess for Value function
  i <- 1
  tol <- 1
  while(tol > 0.0001){
    dt[, v := V[s_next + 1]] 
    dt[, v := u + beta * sum(p*v), by = .(s, a)
       ][, `:=`(v = max(v), i = s_next[which.max(v)]), by = .(s)] # Iteration
    dt1 <- dt[, .(v[1L], i[1L]), by = s]
    Vnew <- dt1$V1           
    sig <- dt1$V2
    tol <- max(abs(V - Vnew))
    V <- Vnew
    i <- i + 1
  }    
})
#  user  system elapsed 
#  5.81    0.40    6.25

To my dismay, the data.table solution is even slower than the following highly non-vectorized solution. As a sloppy data.table-user, I must be missing some data.table functionality. Is there a way to improve things, or, data.table is not suitable for these kinds of computations?

S <- 0:(n-1) # StateSpace
VFI <- function(V){
  out <- rep(0, length(V))
  for(s in S){
    x <- -Inf
    for(a in 0:min(s, M)){
      s_next <- a:(a+B)     # (s')
      x <- max(x, u(s-a) + beta * sum(V[s_next + 1]/(B+1)))
    }
    out[s+1] <- x
  }
  out
}
system.time({
V <- rep(0, n)  # initial guess for Value function
i <- 1
tol <- 1
while(tol > 0.0001){
  Vnew <- VFI(V)           
  tol <- max(abs(V - Vnew))
  V <- Vnew
  i <- i + 1
}    
})
# user  system elapsed 
# 3.81    0.00    3.81

MichaelChirico · Accepted Answer

Here's how I would do this...

DT = CJ(s = seq_len(n)-1L, a = seq_len(m)-1L, s_next = seq_len(n)-1L)
DT[ , p := 0]
#p is 0 unless this is true
DT[between(s_next, a, a + B), p := 1/(B+1)]
#may as well subset to eliminate irrelevant states
DT = DT[p>0 & s>=a]
DT[ , util := u(s - a)]

#don't technically need by, but just to be careful
DT[ , V0 := rep(0, n), by = .(a, s_next)]

while(TRUE) {
  #for each s, maximize given past value;
  #  within each s, have to sum over s_nexts,
  #  to do so, sum by a
  DT[ , V1 := max(.SD[ , util[1L] + beta*sum(V0*p), by = a],
               na.rm = TRUE), by = s]
  if (DT[ , max(abs(V0 - V1))] < 1e-4) break
  DT[ , V0 := V1]
}

On my machine this takes about 15 seconds (so not good)... but maybe this will give you some ideas. For example, this data.table is far too large since there really only are n unique values of V ultimately.

Updating a data.table column sequentially with loop

Answers (1)

Related Questions