Reputation: 824
Context: I'm building a simulation model in R which runs for ~5200 timesteps with ~1000+ individuals. Each timestep, individuals can experience some events with random chance and dynamic rates (~9 total event types). Most events can only occur once per individual per timestep, but some can happen multiple times per individual per timestep. I need to log the times of all events and individuals.
MWE: Here is a minimal working example of the model as I've implemented it now.
system.time({
# init / params
dt = 7 # timestep (days)
R.e = 0.1 # event rate per day (simplified)
N.i = 1000 # number of individuals
N.e = 9 # number of event types
events = letters[1:N.e] # event names
e.group = list(bin=events[1:6],mul=events[7:9]) # binary vs multi
E = lapply(setNames(1:N.e,events),function(e){ # for each event
lapply(1:N.i,function(i){ # for each indiv
numeric(0) # empty event log
})
})
rep.append = function(ts,t,n){ append(ts,rep.int(t,n)) } # helper
# event loop
for (t in 1:520){ # 10 years (shortened)
for (e in e.group$bin){ # {0,1} per indiv per timestep (faster)
i = which(runif(N.i) < (1-exp(-R.e*dt)))
E[[e]][i] = lapply(E[[e]][i],append,t)
}
for (e in e.group$mul){ # 0+ per indiv per timestep (slower)
ni = rpois(N.i,R.e*dt)
i = which(ni>0) # update only those with event (a bit faster)
E[[e]][i] = mapply(rep.append,E[[e]][i],t,ni[i],SIMPLIFY=FALSE) #
# E[[e]] = mapply(rep.append,E[[e]],t,ni,SIMPLIFY=FALSE) # slowest
}
}
# done
})
user system elapsed
4.638 0.028 4.665
Problem:
The runtime bottleneck is how I'm logging events (E[[e]][i] = ...
), because of R's copy-on-modify, which seems to be worse for the second (multi per indiv per timestep) type of events. I'm seeking faster event-log implementation ideas.
Idea:
One idea I had was to initialize an event-count matrix per event matrix(0,ncol=N.i,nrow=5200)
, but this is obviously very memory-expensive and I'm not sure if it will scale for HPC in parallel.
Rcpp: I have explored Rcpp, but my actual rate calculations (simplified above) are actually faster in base R due to optimizations like broadcasting, which
, and stats functions.
Upvotes: -1
Views: 32
Reputation: 824
OK matrices are the way to go! I guess I can just have a RAM bottleneck instead.
MWE:
system.time({
# init / params
dt = 7 # timestep (days)
R.e = 0.1 # event rate per day (simplified)
N.dt = 5200 # 100 years
N.i = 0000 # number of individuals
N.e = 9 # number of event types
events = letters[1:N.e] # event names
e.group = list(bin=events[1:6],mul=events[7:9]) # binary vs multi
E = lapply(setNames(1:N.e,events),function(e){ # for each event
matrix(0,nrow=N.i,ncol=N.dt)
})
rep.append = function(ts,t,n){ append(ts,rep.int(t,n)) } # helper
# event loop
for (t in 1:N.dt){
for (e in e.group$bin){ # {0,1} per indiv per timestep (faster)
E[[e]][,t] = runif(N.i) < (1-exp(-R.e*dt))
}
for (e in e.group$mul){ # 0+ per indiv per timestep (slower)
E[[e]][,t] = rpois(N.i,R.e*dt)
}
}
# done
})
user system elapsed
0.056 0.004 0.061
Upvotes: 0