Reputation: 297

Tidy multiple univariate regressions

db = tibble(a = rnorm(100), b = rnorm(100), c = rnorm(100))

If I want a tidy multivariate linear regression, I just can go

lm(data = db, 0 + a ~ b + c) %>% tidy()

But if I want multiple univariate regressions I would go

lm(data = db, a ~ 0 + b) %>% tidy() %>%
  add_row(lm(data = db, a ~ 0 + c) %>% tidy())

Now, given many regressor columns, I would like to avoid to code every single regressor as a new add_row, how should I make the code more synthetic?

This has a partial solution here:

Tidy output from many single-variable models using purrr, broom

I think the code can be even more lean than in the example?

Upvotes: 1

Answers (4)

teppo

Reputation: 696

I'll add yet another way of doing this. The motivation for this arose from me wanting to add also the (adjusted) r squared to the data frame.

I think this method can be easily extended or modified to other use cases as well.

# Create a list of data frames
lapply(
    
    # The names of the (univariate) regressors
    c( 'carat', 'depth', 'table' ),
    
    function( x, y, data ) {
        
        # Create the model from a data frame with
        # the outcome variable (y) as the first and
        # the regressor (x) as the second variable
        model <- data |>
            select( all_of( c( y, x ) ) ) |>
            lm()
        
        # Create a summary of the model
        smry <- summary( model )
        
        # Create a data frame from the coefficients in the summary
        lm_df <- as.data.frame( smry$coefficients ) |>

            # The variable name and "intercept" are in row names
            rownames_to_column( 'variable' ) |>

            # Drop the intercept row
            filter( variable == x ) |>

            # Add the adjusted r squared of the model
            mutate( adj.r.squared = smry$adj.r.squared )
    },
    
    # Arguments for the function
    y = 'price',  # the name of the outcome variable
    data = diamonds

) |>
    
    # Bind the data frames into a single data frame
    bind_rows()

#   variable   Estimate Std. Error    t value      Pr(>|t|) adj.r.squared
# 1    carat 7756.42562  14.066579 551.408112  0.000000e+00  8.493277e-01
# 2    depth  -29.64997  11.989704  -2.472953  1.340325e-02  9.482952e-05
# 3    table  226.98375   7.625135  29.767833 3.761534e-193  1.614479e-02

Upvotes: 1

TarJae

Reputation: 79311

You could do something like this: Depending your columns:

library(broom)

vars <- names(db)[-1]

models <- list()

for (i in 1:2){
  vc <- combn(vars,i)
  for (j in 1:ncol(vc)){
    model <- as.formula(paste0("a ~", paste0(vc[,j], collapse = "+")))
    models <- c(models, model)
  }
}

lapply(models, function(x) lm(x, data = db) %>% tidy())

[[1]]
# A tibble: 2 x 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)   0.0155    0.0856     0.181   0.857
2 b            -0.0502    0.0797    -0.630   0.530

[[2]]
# A tibble: 2 x 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)   0.0113    0.0856     0.132   0.896
2 c             0.0553    0.0865     0.640   0.524

[[3]]
# A tibble: 3 x 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)   0.0132    0.0860     0.153   0.878
2 b            -0.0439    0.0807    -0.544   0.588
3 c             0.0486    0.0877     0.555   0.580

Upvotes: 1

GiulioGCantone

Reputation: 297

My answer

db %>%
  select(-a) %>%
  names() %>%
  paste('a~0+',.)%>%
  map_df(~tidy(lm(as.formula(.x), 
               data= db, 
               )))

Upvotes: 2

akrun

Reputation: 887961

We could use {} to block the multiple expressions

library(magrittr)
library(broom)
lm(data = db, a ~ 0 + b) %>%
     tidy() %>%  
    {add_row(., lm(data = db, a ~ 0 + c) %>% 
         tidy())}

-output

# A tibble: 2 × 5
  term  estimate std.error statistic p.value
  <chr>    <dbl>     <dbl>     <dbl>   <dbl>
1 b       0.0601    0.0907     0.663   0.509
2 c       0.0411    0.0899     0.457   0.649

Or may do this within summarise and unnest

library(tidyr)
db %>% 
   summarise(out1 = list(bind_rows(lm(a ~ 0 + b) %>% tidy, 
                    lm(a~ 0 + c) %>% tidy))) %>%
   unnest(out1)

-output

# A tibble: 2 × 5
  term  estimate std.error statistic p.value
  <chr>    <dbl>     <dbl>     <dbl>   <dbl>
1 b       0.0601    0.0907     0.663   0.509
2 c       0.0411    0.0899     0.457   0.649

Upvotes: 1

Tidy multiple univariate regressions

Answers (4)

Related Questions