Reputation: 1792
I've been using map()
to calculate and extract certain statistics from multiple lm()
models.
To give a reproducible example, using the mtcars dataset, I start with an input vector of formulae to be estimated using lm()
models:
library(tidyverse)
df <- mtcars
input_char <- c("mpg ~ disp",
"mpg ~ disp + hp")
input_formula <- map(input_char, formula)
I've then got a function that calculates and extracts the relevant statistics for each model. For simplicity and reproducibility, here's a simplified function that just extracts the R-squared of the model.
get_rsquared <- function(a_formula) {
model1 <- lm(a_formula, data = df)
rsquared <- summary(model1)$r.squared
c(model = a_formula, rsquared = rsquared)
}
I've then used map
to iterate through the formulae and extract the R-squared from each model.
models <- map(input_formula, get_rsquared)
models
which gives the output:
[[1]]
[[1]]$model
mpg ~ disp
<environment: 0x7f98987f4000>
[[1]]$rsquared
[1] 0.7183433
[[2]]
[[2]]$model
mpg ~ disp + hp
<environment: 0x7f98987f4000>
[[2]]$rsquared
[1] 0.7482402
My question is regarding the output being a list.
Is there a simple way to make the output a dataframe?
My desired output is:
#> model rsquared
#> 1 mpg ~ disp 0.7183433
#> 2 mpg ~ disp + hp 0.7482402
Upvotes: 0
Views: 536
Reputation: 34441
Keep the formulas as character strings and use as.formula()
as part of the the get_rsquared()
function as it's easier to work with them as character strings than formula objects.
library(purrr)
library(dplyr)
df <- mtcars
input_char <- c("mpg ~ disp",
"mpg ~ disp + hp")
get_rsquared <- function(a_formula) {
model1 <- lm(as.formula(a_formula), data = df)
rsquared <- summary(model1)$r.squared
list(model = a_formula, rsquared = rsquared)
}
map_df(input_char, get_rsquared)
# A tibble: 2 x 2
model rsquared
<chr> <dbl>
1 mpg ~ disp 0.718
2 mpg ~ disp + hp 0.748
Upvotes: 2