jessexknight
jessexknight

Reputation: 824

Efficient append / event logging in simulation model in R

Context: I'm building a simulation model in R which runs for ~5200 timesteps with ~1000+ individuals. Each timestep, individuals can experience some events with random chance and dynamic rates (~9 total event types). Most events can only occur once per individual per timestep, but some can happen multiple times per individual per timestep. I need to log the times of all events and individuals.

MWE: Here is a minimal working example of the model as I've implemented it now.

system.time({
# init / params
dt  = 7    # timestep (days)
R.e = 0.1  # event rate per day (simplified)
N.i = 1000 # number of individuals
N.e = 9    # number of event types
events  = letters[1:N.e] # event names
e.group = list(bin=events[1:6],mul=events[7:9]) # binary vs multi
E = lapply(setNames(1:N.e,events),function(e){ # for each event
  lapply(1:N.i,function(i){ # for each indiv
    numeric(0) # empty event log
  })
})
rep.append = function(ts,t,n){ append(ts,rep.int(t,n)) } # helper
# event loop
for (t in 1:520){ # 10 years (shortened)
  for (e in e.group$bin){ # {0,1} per indiv per timestep (faster)
    i = which(runif(N.i) < (1-exp(-R.e*dt)))
    E[[e]][i] = lapply(E[[e]][i],append,t)
  }
  for (e in e.group$mul){ # 0+ per indiv per timestep (slower)
    ni = rpois(N.i,R.e*dt)
    i = which(ni>0) # update only those with event (a bit faster)
    E[[e]][i] = mapply(rep.append,E[[e]][i],t,ni[i],SIMPLIFY=FALSE) #
    # E[[e]] = mapply(rep.append,E[[e]],t,ni,SIMPLIFY=FALSE) # slowest
  }
}
# done
})
   user  system elapsed 
  4.638   0.028   4.665 

Problem: The runtime bottleneck is how I'm logging events (E[[e]][i] = ...), because of R's copy-on-modify, which seems to be worse for the second (multi per indiv per timestep) type of events. I'm seeking faster event-log implementation ideas.

Idea: One idea I had was to initialize an event-count matrix per event matrix(0,ncol=N.i,nrow=5200), but this is obviously very memory-expensive and I'm not sure if it will scale for HPC in parallel.

Rcpp: I have explored Rcpp, but my actual rate calculations (simplified above) are actually faster in base R due to optimizations like broadcasting, which, and stats functions.

Upvotes: -1

Views: 32

Answers (1)

jessexknight
jessexknight

Reputation: 824

OK matrices are the way to go! I guess I can just have a RAM bottleneck instead.

MWE:

system.time({
# init / params
dt   = 7    # timestep (days)
R.e  = 0.1  # event rate per day (simplified)
N.dt = 5200 # 100 years
N.i  = 0000 # number of individuals
N.e  = 9    # number of event types
events  = letters[1:N.e] # event names
e.group = list(bin=events[1:6],mul=events[7:9]) # binary vs multi
E = lapply(setNames(1:N.e,events),function(e){ # for each event
  matrix(0,nrow=N.i,ncol=N.dt)
})
rep.append = function(ts,t,n){ append(ts,rep.int(t,n)) } # helper
# event loop
for (t in 1:N.dt){
  for (e in e.group$bin){ # {0,1} per indiv per timestep (faster)
    E[[e]][,t] = runif(N.i) < (1-exp(-R.e*dt))
  }
  for (e in e.group$mul){ # 0+ per indiv per timestep (slower)
    E[[e]][,t] = rpois(N.i,R.e*dt)
  }
}
# done
})
   user  system elapsed 
  0.056   0.004   0.061 

Upvotes: 0

Related Questions