Tim_Utrecht
Tim_Utrecht

Reputation: 1519

Speed up random Markov Chain in R using data.table or parellelisation

I am trying to speed up a Monte Carlo simulation of a discrete time-inhomogeneous Markov chain using data.table or some form of parallelisation. Using random dummy transition matrices TM, I am simulating nSteps time steps in each of N simulations and starting from a an initial state vector initialState record the next updated state in currentState. At each time step, I matrix multiply the current state with the transition matrix TM.

Code 1 with loop

nStates <- 5 #number of states
initialState <- c(rep(1/nStates, nStates)) #vector with uniform initial states
nSteps <- 10 #number of time steps
N <- 10000 #number of simulations

ind.arr <- matrix(1:(N*nSteps),ncol=nSteps, byrow=TRUE)
currentState <- vector("list",(N*(nSteps))) #collects the nSteps state vectors for each simulation

system.time(
  for (i in 1:N) {
    TM <- matrix(runif(nStates^2), ncol=nStates) #random transition matrix for each time step and each simulation
    currentState[[(ind.arr[i,1])]] <- initialState %*% (TM / rowSums(TM)) #/rowSums(TM) ensures that TM is a transition matrix
    for (t in 2:nSteps){
      TM <- matrix(runif(nStates^2), ncol=nStates)
      currentState[[(ind.arr[i,t])]] <- currentState[[(ind.arr[i,t-1])]] %*% (TM / rowSums(TM))
    }
  })

The code is not super slow, but I am wondering whether avoiding the N-loop can accelerate the code. If I put the body of the N-loop in a function

statefun <- function(initialState, nSteps, nStates){
  TM <- matrix(runif(nStates^2), ncol=nStates) #random transition matrix for each time step and each simulation
  currentState <- matrix(rep(NA, nSteps*nStates), ncol=nStates)
  currentState[1,] <- initialState %*% (TM / rowSums(TM)) #/rowSums(TM) ensures that TM is a transition matrix
  for (t in 2:nSteps){
    TM <- matrix(runif(nStates^2), ncol=nStates)
    currentState[t,] <- currentState[t-1,] %*% (TM / rowSums(TM))
  }
  return(currentState)
}

and use data.table, I get an error and not the desired result

library(data.table)
system.time(dt <- data.table(i=1:N)[, c("s1", "s2", "s3", "s4", "s5") := list(statefun(initialState, nSteps, nStates)), by=i])

#As each simulation run is independent and the call of statefun is expensive, I was hoping that parallelisation helps to accelerate the code, but trying foreach is actually slower than where I started.  

library(foreach)
system.time(res <- foreach(i=1:N, .combine='c') %do% statefun(initialState, nSteps, nStates))

I appreciate any comment on how to make data.table work or using parallelisation in this case. Many thanks, Tim

@ EDIT:This one doesn't pick up the ten row output of the function call...

system.time( #does not work 
  dt <- data.table(i=1:N)[,c("s1", "s2", "s3", "s4", "s5"):=as.list(statefun(initialState, nSteps, nStates)),by=i]
)

Upvotes: 1

Views: 512

Answers (2)

Steve Weston
Steve Weston

Reputation: 19677

If you convert the outer for loop into a foreach loop with 10,000 tasks, the performance isn't good because the tasks are too small. It's often better to make the number of tasks equal to the number of workers. Here's a simple way to do that using the idiv function from the iterators package:

library(doParallel)
nw <- 4
cl <- makePSOCKcluster(nw)
registerDoParallel(cl)
nStates <- 5
initialState <- c(rep(1/nStates, nStates))
nSteps <- 10
N <- 10000

currentState <- foreach(n=idiv(N, chunks=nw), .combine='c') %dopar% {
  ind.arr <- matrix(1:(n * nSteps), ncol=nSteps, byrow=TRUE)
  cur <- vector("list", n * nSteps)
  for (i in 1:n) {
    TM <- matrix(runif(nStates^2), ncol=nStates)
    cur[[ind.arr[i,1]]] <- initialState %*% (TM / rowSums(TM))
    for (t in 2:nSteps) {
      TM <- matrix(runif(nStates^2), ncol=nStates)
      cur[[(ind.arr[i,t])]] <-
          cur[[(ind.arr[i,t-1])]] %*% (TM / rowSums(TM))
    }
  }
  cur
}

Instead of simply parallelizing the outer for loop, this adds a foreach loop around a smaller version of the sequential code. So if you figure out a way to improve the sequential code you can easily use that in the parallel version. You may also get better performance by not returning all of the intermediate states.

Upvotes: 2

erasmortg
erasmortg

Reputation: 3278

There is an example in this thread that could suit your needs. You would need to use replicate, from the lapply function in base.

 replicate(N, statefun(initialState, nSteps, nStates))

Upvotes: 0

Related Questions