PaulH
PaulH

Reputation: 1

Speeding up Monte Carlo simulations?

I am currently doing research to predict where kudzu (an invasive vine) will spread in Oklahoma over a five year time by using Monte Carlo simulation. I have created a raster with presence points and loaded it into R.

For each Monte Carlo simulation (each year), I am running 6000 iterations to provide accurate results. However, due to "for" loops, this is taking a long time. The first year usually finishes running in 3 days, however the second year has been running for over 3 weeks and still is not complete.

Is there any way to speed this process up?

Each year builds off the previous one. I have provided the code below for the first two years:

OK.rast16<-raster("OK_rast20161.tif")
p.a16<-as.matrix(OK.rast16)
table(p.a16)

# Set the random number seed so results can be reproduced if needed
set.seed(10)

drow.pa16 <- 133.7197873 # distance of grid cell (meters) in n-s direction
trow.pa16 <- length(p.a16[,1]) # total number of rows

dcol.pa16 <- 133.7197873 # distance of grid cell (meters) in e-w direction
tcol.pa16 <- length(p.a16[1,]) # total number of rows

#####Year 1 of infection in 2016#####

kudzu_sim1_16 <- matrix(0,trow.pa16,tcol.pa16)
for(m in 1:6000)
{
  OK.kudzu_16 <- p.a16 # initialize matrix of annual dispersal
  for(i in 1:trow.pa16)
{
  for(j in 1:tcol.pa16)
{
  if(!is.na(p.a16[i,j]) & p.a16[i,j] == 1)
  {
    for(k in 1:trow.pa16)
    {
      for(l in 1:tcol.pa16)
      {
        if(!is.na(OK.kudzu_16[k,l]) & OK.kudzu_16[k,l] == 0)
        {
          distcalc <- sqrt((abs(i-k)*drow.pa16)^2+(abs(j-l)*dcol.pa16)^2)
          prob <- exp(0.0369599-0.00474401*distcalc)
          if(prob>runif(1)) {OK.kudzu_16[k,l] <- 1}
        }
      }
    }
   }
  }
 }
 kudzu_sim1_16 <- OK.kudzu_16+kudzu_sim1_16
}

#####Year 2 of infection in 2016####

kudzu_sim2_16 <- matrix(0,trow.pa16,tcol.pa16)
for(m in 1:6000)
{
  OK.kudzu1_16 <- OK.kudzu_16 # initialize matrix of annual dispersal
   for(i in 1:trow.pa16)
  {
    for(j in 1:tcol.pa16)
   {
     if(!is.na(OK.kudzu_16[i,j]) & OK.kudzu_16[i,j] == 1)
  {
    for(k in 1:trow.pa16)
    {
      for(l in 1:tcol.pa16)
      {
        if(!is.na(OK.kudzu1_16[k,l]) & OK.kudzu1_16[k,l] == 0)
        {
          distcalc <- sqrt((abs(i-k)*drow.pa16)^2+(abs(j-l)*dcol.pa16)^2)
          prob <- exp(0.0369599-0.00474401*distcalc)
          if(prob>runif(1)) {OK.kudzu1_16[k,l] <- 1}
        }
      }
    }
  }
 }
}
kudzu_sim2_16 <- OK.kudzu1_16+kudzu_sim2_16
}

Here is the raster to load to start the code:

kudzu in OK

Upvotes: 0

Views: 521

Answers (1)

John Coleman
John Coleman

Reputation: 52008

Originally a comment, but it rapidly exceeded the length limit:

1) 133mx133m is a very small grid size for spatial simulations on something as large as an entire state. It might help to find a way to make resolution a parameter of your simulation (rather than a hard-wired number), streamline the code so that it runs well at a larger resolution, then increase the resolution. The raster function has an optional parameter named res which can be used to control the resolution.

2) While vectorization will surely help, it is unlikely to transform an algorithm which runs for weeks with no output into one which runs in a reasonable amount of time. Perhaps you need to fundamentally rethink your algorithm. You seem to be comparing every grid cell with every other grid cell. That doesn't strike me as biologically plausible. Kudzu spreads locally. Why should what happens next year to a given 133m x 133m cell in the Oklahoma panhandle depend on the current state of another cell over by Lake Texoma? If your simulation has any biological realism, exp(0.0369599-0.00474401*distcalc) should be negligibly small for two such cells, but your code doesn't neglect it. An algorithm which is in some sense localized might be better.

3) There is an awful lot of entries in your matrix which correspond to points outside of Oklahoma. Unless your model is designed to see how kudzu also diffuses over a large part of Texas, those might be irrelevant for your program. If so, a fundamentally different data structure (e.g. a list of locations) which only has entries for points in Oklahoma might be preferable. Or, maybe not. Just something to think about.

4) For more detailed help, it would help if you explain what your algorithm actually is (and not just what it intends to do). It isn't completely obvious in a quick read of your code.

Upvotes: 0

Related Questions