Efficient Montecarlo simulation over a grid in R

I am running a Montecarlo simulation of a multinomial logit. Therefore I have a function that generates the data and estimates the model. Additionally, I want to generate different datasets over a grid of values. In particular, changing both the number of individuals (n.indiv) and the number of answers by each individual (n.choices).

So far, I have managed to solve it, but at some point, I incurred into a nested for-loop structure over a grid search of the possible values for the number of individuals (n.indiv_list) and the number of answers by each individual(n.choices_list). Finally, I am quite worried about the efficiency of the usage of my last bit of code with the double for-loop structure running on the combinations of the possible values. Probably there is a vectorized way to do it that I am missing (or maybe not?).

Finally, and this is mostly a matter of style, I managed to arrive a multiples objects that contain the models from the combinations of the grid search with informative names, but also would be great if I could collapse all of them in a list but with the current structure, I am not sure how to do it. Thank you in advance!

1) Function that generates data and estimates the model.

library(dplyr)
library(VGAM)
library(mlogit)

#function that generates the data and estimates the model.
mlogit_sim_data <- function(...){
  
  # generating number of (n.alter) X (n.choices)
  df <- data.frame(id= rep(seq(1,n.choices ),n.alter ))
  
  # id per individual
  df <- df %>%
    group_by(id) %>%
    mutate(altern = sequence(n()))%>%
    arrange(id)
  
  #Repeated scheme for each individual + id_ind
  df <- cbind(df[rep(1:nrow(df), n.indiv), ], id_ind = rep(1:n.indiv, each = nrow(df)))
  
  ## creating attributes
  df<- df %>%
    mutate(
      x1=rlnorm(n.indiv*n.alter),
      x2=rlnorm(n.indiv*n.alter),
    )%>%
    group_by(altern) %>%
    mutate(
      id_choice = sequence(n()))%>%
    group_by(id_ind) %>%
    mutate(
      z1 = rpois(1,lambda = 25),
      z2 = rlnorm(1,meanlog = 5, sdlog  = 0.5),
      z3 = ifelse(runif(1, min = 0 , max = 1) > 0.5 , 1 , 0)
    )
  
  # Observed utility
  df$V1 <- with(df,  b1  * x1 +   b2 * x2 )
  
  #### Generate Response Variable ####
  fn_choice_generator <- function(V){
    U <- V + rgumbel(length(V), 0, 1)
    1L * (U == max(U))
  }
  
  # Using fn_choice_generator to generate 'choice' columns 
  df <-  df %>%
    group_by(id_choice) %>%
    mutate(across(starts_with("V"), 
                  fn_choice_generator, .names = "choice_{.col}")) %>% # generating choice(s)
    select(-starts_with("V")) %>% ##drop V variables.
    select(-c(id,id_ind))
  
  
  tryCatch(
    {
      model_result <- mlogit(choice_V1 ~ 0 +  x1 + x2 |1  ,
                                                  data = df,
                                                  idx = c("id_choice", "altern"))
      return(model_result)
    },
    error = function(e){
      return(NA)
    }
  )
  
}

2) Grid search over possible combinations of the data

#List with the values that varies in the simulation
  #number of individuals
  n.indiv_list <- c(1, 15, 100, 500 ) 
  #number of choice situations
  n.choices_list <- c(1, 2, 4, 8, 10)  

# Values that remains constant across simulations 
  #set number of alternatives
  n.alter   <- 3    

## Real parameters
b1 <- 1
b2 <- 2

#Number of reps
nreps <- 10 
#Set seed
set.seed(777)

#iteration over different values in the simulation 
for(i in n.indiv_list) {
  for(j in n.choices_list) {
    n.indiv <- i
    n.choices <- j
    assign(paste0("m_ind_", i, "_choices_", j), lapply(X   = 1:nreps, FUN = mlogit_sim_data))
  }
}

Upvotes: 1

Views: 281

Answers (1)

SteveM
SteveM

Reputation: 2301

You can vectorize using the map2 function of the purrr package:

library(tidyverse)

n.indiv_list <- c(1, 15, 100, 500 ) 
#number of choice situations
n.choices_list <- c(1, 2, 4, 8, 10)
l1 <- length(n.indiv_list)
l2 <- length(n.choices_list)
v1 <- rep(n.indiv_list, each = l2)
v2 <- rep(n.choices_list, l1)  #v1, v2 generate all pairs
> v1
 [1]   1   1   1   1   1  15  15  15  15  15 100 100 100 100 100 500 500 500 500 500
> v2
 [1]  1  2  4  8 10  1  2  4  8 10  1  2  4  8 10  1  2  4  8 10
    
result <- map2(v1, v2, function(v1, v2) assign(paste0("m_ind_", v1, "_choices_", v2), lapply(X   = 1:nreps, FUN = mlogit_sim_data)))

result will be a list of your function outputs.

Upvotes: 1

Related Questions