Reputation: 297
db = tibble(a = rnorm(100), b = rnorm(100), c = rnorm(100))
If I want a tidy multivariate linear regression, I just can go
lm(data = db, 0 + a ~ b + c) %>% tidy()
But if I want multiple univariate regressions I would go
lm(data = db, a ~ 0 + b) %>% tidy() %>%
add_row(lm(data = db, a ~ 0 + c) %>% tidy())
Now, given many regressor columns, I would like to avoid to code every single regressor as a new add_row
, how should I make the code more synthetic?
This has a partial solution here:
Tidy output from many single-variable models using purrr, broom
I think the code can be even more lean than in the example?
Upvotes: 1
Views: 596
Reputation: 696
I'll add yet another way of doing this. The motivation for this arose from me wanting to add also the (adjusted) r squared to the data frame.
I think this method can be easily extended or modified to other use cases as well.
# Create a list of data frames
lapply(
# The names of the (univariate) regressors
c( 'carat', 'depth', 'table' ),
function( x, y, data ) {
# Create the model from a data frame with
# the outcome variable (y) as the first and
# the regressor (x) as the second variable
model <- data |>
select( all_of( c( y, x ) ) ) |>
lm()
# Create a summary of the model
smry <- summary( model )
# Create a data frame from the coefficients in the summary
lm_df <- as.data.frame( smry$coefficients ) |>
# The variable name and "intercept" are in row names
rownames_to_column( 'variable' ) |>
# Drop the intercept row
filter( variable == x ) |>
# Add the adjusted r squared of the model
mutate( adj.r.squared = smry$adj.r.squared )
},
# Arguments for the function
y = 'price', # the name of the outcome variable
data = diamonds
) |>
# Bind the data frames into a single data frame
bind_rows()
# variable Estimate Std. Error t value Pr(>|t|) adj.r.squared
# 1 carat 7756.42562 14.066579 551.408112 0.000000e+00 8.493277e-01
# 2 depth -29.64997 11.989704 -2.472953 1.340325e-02 9.482952e-05
# 3 table 226.98375 7.625135 29.767833 3.761534e-193 1.614479e-02
Upvotes: 1
Reputation: 78927
You could do something like this: Depending your columns:
library(broom)
vars <- names(db)[-1]
models <- list()
for (i in 1:2){
vc <- combn(vars,i)
for (j in 1:ncol(vc)){
model <- as.formula(paste0("a ~", paste0(vc[,j], collapse = "+")))
models <- c(models, model)
}
}
lapply(models, function(x) lm(x, data = db) %>% tidy())
[[1]]
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.0155 0.0856 0.181 0.857
2 b -0.0502 0.0797 -0.630 0.530
[[2]]
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.0113 0.0856 0.132 0.896
2 c 0.0553 0.0865 0.640 0.524
[[3]]
# A tibble: 3 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.0132 0.0860 0.153 0.878
2 b -0.0439 0.0807 -0.544 0.588
3 c 0.0486 0.0877 0.555 0.580
Upvotes: 1
Reputation: 297
My answer
db %>%
select(-a) %>%
names() %>%
paste('a~0+',.)%>%
map_df(~tidy(lm(as.formula(.x),
data= db,
)))
Upvotes: 2
Reputation: 887118
We could use {}
to block the multiple expressions
library(magrittr)
library(broom)
lm(data = db, a ~ 0 + b) %>%
tidy() %>%
{add_row(., lm(data = db, a ~ 0 + c) %>%
tidy())}
-output
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 b 0.0601 0.0907 0.663 0.509
2 c 0.0411 0.0899 0.457 0.649
Or may do this within summarise
and unnest
library(tidyr)
db %>%
summarise(out1 = list(bind_rows(lm(a ~ 0 + b) %>% tidy,
lm(a~ 0 + c) %>% tidy))) %>%
unnest(out1)
-output
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 b 0.0601 0.0907 0.663 0.509
2 c 0.0411 0.0899 0.457 0.649
Upvotes: 1