Reputation: 1441
I am trying to run simulation scenarios which in turn should provide me with the best scenario for a given date, back tested a couple of months. The input for a specific scenario has 4 input variables with each of the variables being able to be in 5 states (625 permutations). The flow of the model is as follows:
The unfortunate result for this is 5 nested for loops which can take extremely long to run. I had a look at the foreach
package, but I am concerned around how the combining of the outputs will work in my scenario.
The current code that I am using works as follows, first I create the possible states of each of the inputs along with the window
a<-seq(as.Date("2015-12-01", "%Y-%m-%d"),as.Date(Sys.Date()-1, "%Y-%m-%d"),by="day")
#input variables
b<-seq(1,5,1)
c<-seq(1,5,1)
d<-seq(1,5,1)
e<-seq(1,5,1)
set.seed(3142)
tot_results<-NULL
Next the nested for loops proceed to run through the simulations for me.
for(i in 1:length(a))
{
cat(paste0("\n","Current estimation date: ", a[i]),";itteration:",i," \n")
#subset data for backtesting
dataset_calc<-dataset[which(dataset$Date<=a[i]),]
p=1
results<-data.frame(rep(NA,625))
for(j in 1:length(b))
{
for(k in 1:length(c))
{
for(l in 1:length(d))
{
for(m in 1:length(e))
{
if(i==1)
{
#create a unique ID to merge onto later
unique_ID<-paste0(replicate(1, paste(sample(LETTERS, 5, replace=TRUE), collapse="")),round(runif(n=1,min=1,max=1000000)))
}
#Run profit calculation
post_sim_results<-profit_calc(dataset_calc, param1=e[m],param2=d[l],param3=c[k],param4=b[j])
#Exctract the final profit amount
profit<-round(post_sim_results[nrow(post_sim_results),],2)
results[p,]<-data.frame(unique_ID,profit)
p=p+1
}
}
}
}
#extract the ranks for all scenarios
rank<-rank(results$profit)
#bind the ranks for the expanding window
if(i==1)
{
tot_results<-data.frame(ID=results[,1],rank)
}else{
tot_results<-cbind(tot_results,rank)
}
suppressMessages(gc())
}
My biggest concern is the binding of the results given that the outer loop's actions are dependent on the output of the inner loops.
Any advice on how proceed would greatly be appreciated.
Upvotes: 0
Views: 477
Reputation: 3235
So I think that you can vectorize most of this, which should give a big reduction in run time.
Currently, you use for
-loops (5, to be exact) to create every combination of values, and then run the values one by one through profit_calc
(a function that is not specified). Ideally, you'd just take all possible combinations in one go and push them through profit_calc
in one single operation.
-- Rationale --
a <- 1:10
b <- 1:10
d <- rep(NA,10)
for (i in seq(a)) d[i] <- a[i] * b[i]
d
# [1] 1 4 9 16 25 36 49 64 81 100
Since *
also works on vectors, we can rewrite this to:
a <- 1:10
b <- 1:10
d <- a*b
d
# [1] 1 4 9 16 25 36 49 64 81 100
While it may save us only one line of code, it actually reduces the problem from 10 steps to 1 step.
-- Application --
So how does that apply to your code? Well, given that we can vectorize profit_calc
, you can basically generate a data frame where each row is every possible combination of your parameters. We can do this with expand.grid
:
foo <- expand.grid(b,c,d,e)
head(foo)
# Var1 Var2 Var3 Var4
# 1 1 1 1 1
# 2 2 1 1 1
# 3 3 1 1 1
# 4 4 1 1 1
# 5 5 1 1 1
# 6 1 2 1 1
Lets say we have a formula... (a - b) / (c + d)
... Then it would work like:
bar <- (foo[,1] - foo[,2]) * (foo[,3] + foo[,4])
head(bar)
# [1] 0 2 4 6 8 -2
So basically, try to find a way to replace for
-loops with vectorized options. If you cannot vectorize something, try looking into apply
instead, as that can also save you some time in most cases. If your code is running too slow, you'd ideally first see if you can write a more efficient script. Also, you may be interested in the microbenchmark
library, or ?system.time
.
Upvotes: 1