user9292
user9292

Reputation: 1145

Extracting outputs from a list and save in a data frame

I'm doing some modeling experiments and I need to present the output for multiple models in a specific format for further analysis.

Here is some code to generate multiple models:

# This to generate the data
resp <- sample(0:1,100,TRUE)
x1 <- c(rep(5,20),rep(0,15), rep(2.5,40),rep(17,25))
x2 <- c(rep(23,10),rep(5,10), rep(15,40),rep(1,25), rep(2, 15))
x3 <- c(rep(2,10),rep(50,10), rep(1,40),rep(112,25), rep(22, 15))
dat <- data.frame(resp,x1, x2, x3)


# This to build multiple models
InitLOogModel<-list()
n <- 3
for (i in 1:n)
{
  ### Create training and testing data
  ## 80% of the sample size
  # Note that I didn't use seed so that random split is performed every iteration.
  smp_sizelogis <- floor(0.8 * nrow(dat))

  train_indlogis <- sample(seq_len(nrow(dat)), size = smp_sizelogis)

  trainlogis <- dat[train_indlogis, ]
  testlogis  <- dat[-train_indlogis, ]

  InitLOogModel[[i]] <- glm(resp ~ ., data =trainlogis, family=binomial)
}

Here is the output:

InitLOogModel
[[1]]

Call:  glm(formula = resp ~ ., family = binomial, data = trainlogis)

Coefficients:
(Intercept)           x1           x2           x3  
  -0.007270     0.004585    -0.015271    -0.009911  

Degrees of Freedom: 79 Total (i.e. Null);  76 Residual
Null Deviance:      106.8 
Residual Deviance: 104.5    AIC: 112.5

[[2]]

Call:  glm(formula = resp ~ ., family = binomial, data = trainlogis)

Coefficients:
(Intercept)           x1           x2           x3  
   1.009670    -0.058227    -0.058783    -0.008337  

Degrees of Freedom: 79 Total (i.e. Null);  76 Residual
Null Deviance:      110.1 
Residual Deviance: 108.1    AIC: 116.1

[[3]]

Call:  glm(formula = resp ~ ., family = binomial, data = trainlogis)

Coefficients:
(Intercept)           x1           x2           x3  
    1.51678     -0.06482     -0.07868     -0.01440  

Degrees of Freedom: 79 Total (i.e. Null);  76 Residual
Null Deviance:      110.5 
Residual Deviance: 106.3    AIC: 114.3

Note that the output here is a list. Now this is the output I need to create as a data frame (let's call outDF):

    Model   Intercept   x1              x2          x3 
      1     -0.00727    0.004585    -0.015271   -0.009911 
      2     1.00967     -0.058227   -0.058783   -0.008337 
      3     1.51678     -0.06482    -0.07868    -0.0144   

Note that the numbers inside each column in outDF are just the regression coefficients. This is how to get them for Model 1 for example:

as.data.frame(coef(summary(InitLOogModel[[1]]))[,1])

Upvotes: 2

Views: 84

Answers (3)

josliber
josliber

Reputation: 44299

You can loop through your list of models and grab the desired summary information with sapply:

as.data.frame(t(sapply(InitLOogModel, function(x) coef(summary(x))[,1])))
#   (Intercept)         x1          x2            x3
# 1   0.5047799 0.01932560 -0.01268125 -0.0041356214
# 2  -1.2712605 0.11281741  0.06717180  0.0050441023
# 3  -0.7052121 0.08568746  0.03964437  0.0003167443

sapply in this case creates a column of coefficients for each model. Since we want the models to be the rows instead of the columns, we use t to transpose the result.

Upvotes: 2

Jake Kaupp
Jake Kaupp

Reputation: 8072

You could also use a tidyverse solution, which I personally find to produce easier to read code, at the expense of using more packages.

EDIT: While @Ista may be right about the nested-listframe approach appearing complex, it has the appeal of keeping the full steps of the analysis from data to model to model details. This approach doesn't compute anything extra, just simply manipulates the data to the desired requested result.

I also prefer it to keeping things in lists of data frames as I find they make things easier to access for downstream work. It boils down to preference in method and how well it fits your workflow.

library(tidyverse)

smp_sizelogis <- floor(0.8 * nrow(dat))
rows <- seq_len(nrow(dat))

analysis <- rerun(3, resample(dat, sample(rows, size = smp_sizelogis))) %>%
  tibble(data = .) %>% 
  add_rownames("model_number") %>% 
  mutate(model = map(data, ~glm('resp ~ .', family = binomial, data = .))) %>% 
  mutate(coefs = map(model, tidy))

analysis %>% 
  select(model_number, term, estimate) %>% 
  spread(term, estimate) %>% 
  select(-`(Intercept)`)

# A tibble: 3 × 4
  model_number          x1         x2          x3
*        <chr>       <dbl>      <dbl>       <dbl>
1            1 -0.08160034 0.03156254 0.008613346
2            2 -0.04740939 0.04084883 0.004282003
3            3 -0.05980735 0.01625652 0.002075468

Upvotes: 0

Ista
Ista

Reputation: 10437

The sapply approach in @josliber 's answer is reasonable, but I would tend to prefer to leave the results in a list and combine them afterword. The principle is that you the simplification that sapply does is a convenience only -- if it is not convenient, don't use it. Just combine the results in whatever way makes sense for your specific situation. This principle leads to the following:

do.call(rbind, lapply( InitLOogModel, coef))

I know that coef.lm returns a vector, and since I know that each model has the same coefficients I know that it makes sense to rbind them. Notice that I avoid the taking the summary of each model since that doesn't produce anything needed for the result we want to achieve.

Of course do.call(rbind ... returns a matrix instead of a data.frame. If a data.frame is desired the matrix can be converted with as.data.frame

do.call(rbind, lapply( InitLOogModel, coef))

EDIT: Inspired by @jake-kaupp 's answer here is how I would do it in the tidyverse:

The combining of coefficients looks very similar to the base R approach above:

library(tidyverse)
map(InitLOogModel, coef) %>%
  reduce(rbind)

The for-loop used to construct the list of models can be replaced with

library(modelr)

smp_sizelogis <- floor(0.8 * nrow(dat))
rows <- seq_len(nrow(dat))

rerun(3, dat %>%
         resample(sample(rows, size = smp_sizelogis))) %>%
  map(function(x) glm(resp ~ ., family = binomial, data = x))

Putting the whole thing together gives us

smp_sizelogis <- floor(0.8 * nrow(dat))
rows <- seq_len(nrow(dat))

rerun(3, dat %>%
         resample(sample(rows, size = smp_sizelogis))) %>%
  map(function(x) glm(resp ~ ., family = binomial, data = x)) %>%
  map(coef) %>%
  reduce(rbind)

The main advantages over @jake-kaupp 's answer are that a) we don't compute stuff we don't need, and b) we never stuff the results into a data.frame, so we never have to think about how to get the pieces we want back out.

Upvotes: 0

Related Questions