Creating an r dataframe of gam models

Question

If I fit three different gam models as follows:

library(mgcv)
set.seed(0)
df <- data.frame(count = rpois(100,1),
                 pred1 = rnorm(100, 10, 1), 
                 pred2 = rnorm(100, 0, 1), 
                 pred3 = rnorm(100, 0, 1))

m1 <- gam(count ~ s(pred1),
             data = dat, 
             family = poisson(link="log"), 
             method = "REML", 
             select = TRUE)

m2 <- gam(count ~ s(pred2),
          data = dat, 
          family = poisson(link="log"), 
          method = "REML", 
          select = TRUE)

m3 <- gam(count ~ s(pred3),
          data = dat, 
          family = poisson(link="log"), 
          method = "REML", 
          select = TRUE)

And then try and put them into a single dataframe:

models <- data.frame(m = c(m1,m2,m3))

I get this error:

Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : 
  cannot coerce class ‘"family"’ to a data.frame

Any ideas how to fix this? I want to create structure that I can loop over to make some predictions from.

Parfait · Accepted Answer

As docs indicate, the return value of mgcv::gam is an object of gam class. This gamObject inherits from base R's class objects (lm and glm) and so includes many underlying elements that cannot be easily binded into the two dimensions of a data frame:

Fitted Gam Object

A fitted GAM object returned by function gam and of class "gam" inheriting from classes "glm" and "lm". Method functions anova, logLik, influence, plot, predict, print, residuals and summary exist for this class.

Usually to retrieve estimates from these model objects, you would run summary to return a list of named elements such as coefficients, residuals, etc. From there, extract the needed components that can be either a vector, matrix, or list into data frames. Note: due to varying nature of lengths and types of underlying components, there is no simple method to extract all estimates of model to a data frame.

You will have to ask yourself:

What specific estimates of model do I want in a data frame?
Do I keep all three model estimates in one data frame or use a list of data frames?
What indicator data (data, formula, etc.) to store to distinguish form others?

StackOverflow R posts contain many examples of how to extract model estimates like coefficients into data frames.

One implementation is to define a method to extract model estimates with input parameter being a formula which appears to be only difference between all three models.

run_gam_models <- function(my_formula) {
      fit <- gam(my_formula,
                 data = dat, 
                 family = poisson(link="log"), 
                 method = "REML", 
                 select = TRUE)

      results <- summary(fit)

      df <- data.frame(results$coefficients, ...)
      return(df)
}

# LIST OF DATA FRAMES
coeffs_df_list <- sapply(names(dat)[-1], function(col) {
       f <- as.formula(paste0("count ~ ", col))
       run_gam_models(f)
}, simplify = FALSE)

# INDIVIDUAL DATA FRAMES
coeffs_df_list$pred1
coeffs_df_list$pred2
coeffs_df_list$pred3

Online Demo (using glm)

Creating an r dataframe of gam models

Answers (1)

Related Questions