Reputation: 1145
I'm doing some modeling experiments and I need to present the output for multiple models in a specific format for further analysis.
Here is some code to generate multiple models:
# This to generate the data
resp <- sample(0:1,100,TRUE)
x1 <- c(rep(5,20),rep(0,15), rep(2.5,40),rep(17,25))
x2 <- c(rep(23,10),rep(5,10), rep(15,40),rep(1,25), rep(2, 15))
x3 <- c(rep(2,10),rep(50,10), rep(1,40),rep(112,25), rep(22, 15))
dat <- data.frame(resp,x1, x2, x3)
# This to build multiple models
InitLOogModel<-list()
n <- 3
for (i in 1:n)
{
### Create training and testing data
## 80% of the sample size
# Note that I didn't use seed so that random split is performed every iteration.
smp_sizelogis <- floor(0.8 * nrow(dat))
train_indlogis <- sample(seq_len(nrow(dat)), size = smp_sizelogis)
trainlogis <- dat[train_indlogis, ]
testlogis <- dat[-train_indlogis, ]
InitLOogModel[[i]] <- glm(resp ~ ., data =trainlogis, family=binomial)
}
Here is the output:
InitLOogModel
[[1]]
Call: glm(formula = resp ~ ., family = binomial, data = trainlogis)
Coefficients:
(Intercept) x1 x2 x3
-0.007270 0.004585 -0.015271 -0.009911
Degrees of Freedom: 79 Total (i.e. Null); 76 Residual
Null Deviance: 106.8
Residual Deviance: 104.5 AIC: 112.5
[[2]]
Call: glm(formula = resp ~ ., family = binomial, data = trainlogis)
Coefficients:
(Intercept) x1 x2 x3
1.009670 -0.058227 -0.058783 -0.008337
Degrees of Freedom: 79 Total (i.e. Null); 76 Residual
Null Deviance: 110.1
Residual Deviance: 108.1 AIC: 116.1
[[3]]
Call: glm(formula = resp ~ ., family = binomial, data = trainlogis)
Coefficients:
(Intercept) x1 x2 x3
1.51678 -0.06482 -0.07868 -0.01440
Degrees of Freedom: 79 Total (i.e. Null); 76 Residual
Null Deviance: 110.5
Residual Deviance: 106.3 AIC: 114.3
Note that the output here is a list. Now this is the output I need to create as a data frame (let's call outDF):
Model Intercept x1 x2 x3
1 -0.00727 0.004585 -0.015271 -0.009911
2 1.00967 -0.058227 -0.058783 -0.008337
3 1.51678 -0.06482 -0.07868 -0.0144
Note that the numbers inside each column in outDF are just the regression coefficients. This is how to get them for Model 1 for example:
as.data.frame(coef(summary(InitLOogModel[[1]]))[,1])
Upvotes: 2
Views: 84
Reputation: 44299
You can loop through your list of models and grab the desired summary information with sapply
:
as.data.frame(t(sapply(InitLOogModel, function(x) coef(summary(x))[,1])))
# (Intercept) x1 x2 x3
# 1 0.5047799 0.01932560 -0.01268125 -0.0041356214
# 2 -1.2712605 0.11281741 0.06717180 0.0050441023
# 3 -0.7052121 0.08568746 0.03964437 0.0003167443
sapply
in this case creates a column of coefficients for each model. Since we want the models to be the rows instead of the columns, we use t
to transpose the result.
Upvotes: 2
Reputation: 8072
You could also use a tidyverse
solution, which I personally find to produce easier to read code, at the expense of using more packages.
EDIT: While @Ista may be right about the nested-listframe approach appearing complex, it has the appeal of keeping the full steps of the analysis from data to model to model details. This approach doesn't compute anything extra, just simply manipulates the data to the desired requested result.
I also prefer it to keeping things in lists of data frames as I find they make things easier to access for downstream work. It boils down to preference in method and how well it fits your workflow.
library(tidyverse)
smp_sizelogis <- floor(0.8 * nrow(dat))
rows <- seq_len(nrow(dat))
analysis <- rerun(3, resample(dat, sample(rows, size = smp_sizelogis))) %>%
tibble(data = .) %>%
add_rownames("model_number") %>%
mutate(model = map(data, ~glm('resp ~ .', family = binomial, data = .))) %>%
mutate(coefs = map(model, tidy))
analysis %>%
select(model_number, term, estimate) %>%
spread(term, estimate) %>%
select(-`(Intercept)`)
# A tibble: 3 × 4
model_number x1 x2 x3
* <chr> <dbl> <dbl> <dbl>
1 1 -0.08160034 0.03156254 0.008613346
2 2 -0.04740939 0.04084883 0.004282003
3 3 -0.05980735 0.01625652 0.002075468
Upvotes: 0
Reputation: 10437
The sapply
approach in @josliber 's answer is reasonable, but I would tend to prefer to leave the results in a list and combine them afterword. The principle is that you the simplification that sapply
does is a convenience only -- if it is not convenient, don't use it. Just combine the results in whatever way makes sense for your specific situation. This principle leads to the following:
do.call(rbind, lapply( InitLOogModel, coef))
I know that coef.lm
returns a vector, and since I know that each model has the same coefficients I know that it makes sense to rbind
them. Notice that I avoid the taking the summary
of each model since that doesn't produce anything needed for the result we want to achieve.
Of course do.call(rbind ...
returns a matrix instead of a data.frame. If a data.frame is desired the matrix can be converted with as.data.frame
do.call(rbind, lapply( InitLOogModel, coef))
EDIT:
Inspired by @jake-kaupp 's answer here is how I would do it in the tidyverse
:
The combining of coefficients looks very similar to the base R approach above:
library(tidyverse)
map(InitLOogModel, coef) %>%
reduce(rbind)
The for-loop used to construct the list of models can be replaced with
library(modelr)
smp_sizelogis <- floor(0.8 * nrow(dat))
rows <- seq_len(nrow(dat))
rerun(3, dat %>%
resample(sample(rows, size = smp_sizelogis))) %>%
map(function(x) glm(resp ~ ., family = binomial, data = x))
Putting the whole thing together gives us
smp_sizelogis <- floor(0.8 * nrow(dat))
rows <- seq_len(nrow(dat))
rerun(3, dat %>%
resample(sample(rows, size = smp_sizelogis))) %>%
map(function(x) glm(resp ~ ., family = binomial, data = x)) %>%
map(coef) %>%
reduce(rbind)
The main advantages over @jake-kaupp 's answer are that a) we don't compute stuff we don't need, and b) we never stuff the results into a data.frame, so we never have to think about how to get the pieces we want back out.
Upvotes: 0