Reputation:

Is there a way to create multiple regression outputs for a list of dependent variables to one independent variable?

I have a data set of about 500 stocks with their returns. I was wondering if there was a way to generate multiple regression outputs, a 1 to 1 relationship, for a dependent variable to one independent?

For example, here is a simple regression output for one of the stocks MSFT compared to the market. Instead of creating multiple lines like the ones I have below, is there a way to put each of the 500 variables into the Y component of the formula and generate a new output for each? Maybe even go as far as put all this data into one table?

These are the variables that I am trying to group to ease my process: screenshot of variable table

regression_model <- lm(raw_data$MSFT~raw_data$VFINX, raw_data)
summary(regression_model)

Upvotes: 1

Answers (1)

neilfws

Reputation: 33812

One approach to this (which is sure to be a duplicate answer somewhere?) is:

reshape the data to long format
nest the data to create a column of tibbles containing the independent and dependent values
run each regression and store the results in a new column
tidy the regression output and unnest the data

Your example data is not usable, so let's use mtcars. Assume we want to predict mpg using each of disp, hp, drat, wt and qsec. First we select, gather and nest:

library(dplyr)
library(tidyr)
library(purrr)
library(broom)

mtcars %>% 
  select(mpg, disp, hp, drat, wt, qsec) %>% 
  gather(Var, Val, -mpg) %>% 
  nest(data = c(mpg, Val))

Result:

# A tibble: 5 x 2
  Var   data             
  <chr> <list>           
1 disp  <tibble [32 x 2]>
2 hp    <tibble [32 x 2]>
3 drat  <tibble [32 x 2]>
4 wt    <tibble [32 x 2]>
5 qsec  <tibble [32 x 2]>

Now we can map each row to a regression, and create the column of tidied output:

mtcars %>% 
  select(mpg, disp, hp, drat, wt, qsec) %>% 
  gather(Var, Val, -mpg) %>% 
  nest(data = c(mpg, Val)) %>% 
  mutate(model = map(data, ~lm(mpg ~ Val, data = .)),
         tidied = map(model, tidy))

# A tibble: 5 x 4
  Var   data              model  tidied          
  <chr> <list>            <list> <list>          
1 disp  <tibble [32 x 2]> <lm>   <tibble [2 x 5]>
2 hp    <tibble [32 x 2]> <lm>   <tibble [2 x 5]>
3 drat  <tibble [32 x 2]> <lm>   <tibble [2 x 5]>
4 wt    <tibble [32 x 2]> <lm>   <tibble [2 x 5]>
5 qsec  <tibble [32 x 2]> <lm>   <tibble [2 x 5]>

And finally, select the columns we want and unnest:

mtcars %>% 
  select(mpg, disp, hp, drat, wt, qsec) %>% 
  gather(Var, Val, -mpg) %>% 
  nest(data = c(mpg, Val)) %>% 
  mutate(model = map(data, ~lm(mpg ~ Val, data = .)),
         tidied = map(model, tidy)) %>% 
  select(-model, -data) %>% 
  unnest(cols = c(tidied))

Result:

# A tibble: 10 x 6
   Var   term        estimate std.error statistic  p.value
   <chr> <chr>          <dbl>     <dbl>     <dbl>    <dbl>
 1 disp  (Intercept)  29.6      1.23       24.1   3.58e-21
 2 disp  Val          -0.0412   0.00471    -8.75  9.38e-10
 3 hp    (Intercept)  30.1      1.63       18.4   6.64e-18
 4 hp    Val          -0.0682   0.0101     -6.74  1.79e- 7
 5 drat  (Intercept)  -7.52     5.48       -1.37  1.80e- 1
 6 drat  Val           7.68     1.51        5.10  1.78e- 5
 7 wt    (Intercept)  37.3      1.88       19.9   8.24e-19
 8 wt    Val          -5.34     0.559      -9.56  1.29e-10
 9 qsec  (Intercept)  -5.11    10.0        -0.510 6.14e- 1
10 qsec  Val           1.41     0.559       2.53  1.71e- 2

You could add filters from dplyr::filter(), for example to remove the Intercept rows, or select a p-value threshold.

mtcars %>% 
  select(mpg, disp, hp, drat, wt, qsec) %>% 
  gather(Var, Val, -mpg) %>% 
  nest(data = c(mpg, Val)) %>% 
  mutate(model = map(data, ~lm(mpg ~ Val, data = .)),
         tidied = map(model, tidy)) %>% 
  select(-model, -data) %>% 
  unnest(cols = c(tidied)) %>% 
  filter(p.value < 0.01,
         term != "(Intercept)")

# A tibble: 5 x 6
  Var   term  estimate std.error statistic  p.value
  <chr> <chr>    <dbl>     <dbl>     <dbl>    <dbl>
1 disp  Val    -0.0412   0.00471     -8.75 9.38e-10
2 hp    Val    -0.0682   0.0101      -6.74 1.79e- 7
3 drat  Val     7.68     1.51         5.10 1.78e- 5
4 wt    Val    -5.34     0.559       -9.56 1.29e-10

Upvotes: 1

Is there a way to create multiple regression outputs for a list of dependent variables to one independent variable?

Answers (1)

Related Questions