Hailey
Hailey

Reputation: 21

How to write an R function that iterates through a data frame of formulas to fit multiple model variations

I have a list of many combinations of model variables that I'm testing for model fit. I need to figure out how to write an R code that iterates through each one in the model fit. This is what I have so far:

This chunk makes a string with all of the formulas for the model:

# example list of the variables
var <- c("A", "B", "C", "D")

n = length(var)

# make list of all possible combinations
id <- unlist(
  lapply(1:n,
         function(i) combn(1:n,i,simplify = FALSE)), recursive = FALSE)

# make the combinations into formulas
frmlas <- (sapply(id, function(i)
  paste("DV ~ ", paste(var[i], collapse = "+"))))

This chunk is where I am stuck:

# Add ID's to the model combinations for naming the outputs numerically:
frmlasnum <- as.data.frame(frmlas)
frmlasnum$ID <- seq.int(nrow(frmlasnum))

# Now make a function that fits the models while outputting an .rds file for each:

modelfit <- function(frmlasnum) {
  for (x in 1:length(frmlasnum)) {
    name <- df[x,"ID"]
    model <- ssn_lm(formula = x, ssn.object = df)
    write_rds(model, paste(name,".rds"))
  }
}

# I omitted the rest of the ssn_lm functions after ssn.object for simplicity, not running the model without them

I know I have a bunch of wrong things in there, and I'm sorry this isn't reproducible but I'm hoping someone can give me advice on how to fix the function. Thank you in advance.

Upvotes: 2

Views: 78

Answers (2)

Rui Barradas
Rui Barradas

Reputation: 76641

This is based in Edward's answer. Its purpose is to simplify his code, not more.

library(SSN2)

copy_lsn_to_temp()
temp_path <- paste0(tempdir(), "/MiddleFork04.ssn")
mf04p <- ssn_import(temp_path, overwrite = TRUE)

# use 'var' directly in 'combn', no need for 'n' gymnastics
var <- c("ELEV_DEM", "SLOPE", "rcaAreaKm2")
id2 <- unlist(
  lapply(seq_along(var), function(i) combn(var, i, simplify = FALSE)), 
  recursive = FALSE)

# use 'reformulate' instead of 'paste', it produces formula objects
frmlas2 <- sapply(id2, reformulate, response = "Summer_mn")
# apply the modeling function
models2 <- lapply(frmlas2, ssn_lm, ssn.object = mf04p)

# extract coefficients, summaries, etc
lapply(models2, coef)
lapply(models2, summary)

# or save the summaries in a list and extract what you want from the list
models2_smry <- lapply(models2, summary)
# Pseudo-R2
sapply(models2_smry, `[[`, "pseudoR2")
# p-values
sapply(models2_smry, \(x) x$coefficients$fixed$p)

Edit

To answer to the follow-up question in comment,

How would I write frmlas2 if I want to include additional variables in every formula? I have my list of variables in var, but I want to add variables not included in var to every model variation. Example: var <- c("A", "B"), I want every formulation to have "C" in addition, so model formulations would be DV ~ A+C, DV ~ B+C, DV ~ A+B+C.

other_vars <- c("D", "E")
frmlas3 <- sapply(id2, \(v) reformulate(c(v, other_vars), response = "Summer_mn"))

Upvotes: 1

Edward
Edward

Reputation: 19339

Since no data was provided, I'll use the mf04p dataset from the SSN2 package, which contains the ssn_lm function shown.

Load the package:

library(SSN2)

Load the data:

copy_lsn_to_temp()
temp_path <- paste0(tempdir(), "/MiddleFork04.ssn")
mf04p <- ssn_import(temp_path, overwrite = TRUE)
#mf04p 

This is your code:

Example list of the variables

var <- c("ELEV_DEM", "SLOPE", "rcaAreaKm2")

n = length(var)

Make a list of all possible combinations

id <- unlist(
  lapply(1:n,
         function(i) combn(1:n,i,simplify = FALSE)), recursive = FALSE)

Make the combinations into formulas. Note the use of as.formula

frmlas <- (sapply(id, function(i)
  as.formula(paste("Summer_mn ~ ", paste(var[i], collapse = "+")))))

This is where you got stuck:

Run the models using lapply:

models <- lapply(frmlas, ssn_lm, ssn.object = mf04p)

Save to RDS files:

invisible(
  mapply(saveRDS, object=models, file=paste0("Model_", seq_along(models), ".rds"))
)

Upvotes: 3

Related Questions