Hanjo Odendaal
Hanjo Odendaal

Reputation: 1441

Parallel processing for multiple nested for loops

I am trying to run simulation scenarios which in turn should provide me with the best scenario for a given date, back tested a couple of months. The input for a specific scenario has 4 input variables with each of the variables being able to be in 5 states (625 permutations). The flow of the model is as follows:

  1. Simulate 625 scenarios to get each of their profit
  2. Rank each of the scenarios according to their profit
  3. Repeat the process through a 1-day expanding window for the last 2 months starting on the 1st Dec 2015 - creating a time series of ranks for each of the 625 scenarios

The unfortunate result for this is 5 nested for loops which can take extremely long to run. I had a look at the foreach package, but I am concerned around how the combining of the outputs will work in my scenario.

The current code that I am using works as follows, first I create the possible states of each of the inputs along with the window

a<-seq(as.Date("2015-12-01", "%Y-%m-%d"),as.Date(Sys.Date()-1, "%Y-%m-%d"),by="day")
#input variables
b<-seq(1,5,1)
c<-seq(1,5,1)
d<-seq(1,5,1)
e<-seq(1,5,1)

set.seed(3142)

tot_results<-NULL

Next the nested for loops proceed to run through the simulations for me.

for(i in 1:length(a))
{
cat(paste0("\n","Current estimation date: ", a[i]),";itteration:",i," \n")
#subset data for backtesting
dataset_calc<-dataset[which(dataset$Date<=a[i]),]
p=1
results<-data.frame(rep(NA,625))
    for(j in 1:length(b))
    {
        for(k in 1:length(c))
        {
            for(l in 1:length(d))
            {
                for(m in 1:length(e))
                {
                if(i==1)
                {
                    #create a unique ID to merge onto later
                    unique_ID<-paste0(replicate(1, paste(sample(LETTERS, 5, replace=TRUE), collapse="")),round(runif(n=1,min=1,max=1000000)))
                }
                #Run profit calculation
                post_sim_results<-profit_calc(dataset_calc, param1=e[m],param2=d[l],param3=c[k],param4=b[j])
                #Exctract the final profit amount
                profit<-round(post_sim_results[nrow(post_sim_results),],2)

                results[p,]<-data.frame(unique_ID,profit)
                p=p+1
                }
            }
        }
    }
    #extract the ranks for all scenarios
    rank<-rank(results$profit)

    #bind the ranks for the expanding window
    if(i==1)
        {
            tot_results<-data.frame(ID=results[,1],rank)
        }else{
            tot_results<-cbind(tot_results,rank)
        }
    suppressMessages(gc())
}

My biggest concern is the binding of the results given that the outer loop's actions are dependent on the output of the inner loops.

Any advice on how proceed would greatly be appreciated.

Upvotes: 0

Views: 477

Answers (1)

slamballais
slamballais

Reputation: 3235

So I think that you can vectorize most of this, which should give a big reduction in run time.

Currently, you use for-loops (5, to be exact) to create every combination of values, and then run the values one by one through profit_calc (a function that is not specified). Ideally, you'd just take all possible combinations in one go and push them through profit_calc in one single operation.

-- Rationale --

a <- 1:10
b <- 1:10
d <- rep(NA,10)
for (i in seq(a)) d[i] <- a[i] * b[i]
d 

# [1]   1   4   9  16  25  36  49  64  81 100

Since * also works on vectors, we can rewrite this to:

a <- 1:10
b <- 1:10
d <- a*b
d

# [1]   1   4   9  16  25  36  49  64  81 100

While it may save us only one line of code, it actually reduces the problem from 10 steps to 1 step.

-- Application --

So how does that apply to your code? Well, given that we can vectorize profit_calc, you can basically generate a data frame where each row is every possible combination of your parameters. We can do this with expand.grid:

foo <- expand.grid(b,c,d,e)
head(foo)

#   Var1 Var2 Var3 Var4
# 1    1    1    1    1
# 2    2    1    1    1
# 3    3    1    1    1
# 4    4    1    1    1
# 5    5    1    1    1
# 6    1    2    1    1

Lets say we have a formula... (a - b) / (c + d)... Then it would work like:

bar <- (foo[,1] - foo[,2]) * (foo[,3] + foo[,4])
head(bar)

# [1]  0  2  4  6  8 -2

So basically, try to find a way to replace for-loops with vectorized options. If you cannot vectorize something, try looking into apply instead, as that can also save you some time in most cases. If your code is running too slow, you'd ideally first see if you can write a more efficient script. Also, you may be interested in the microbenchmark library, or ?system.time.

Upvotes: 1

Related Questions