Alina
Alina

Reputation: 21

R: How can I convert a list of linear regression results to a dataframe?

I am running a multivariate regression on ~150 different outcomes. Because gathering the results by individual tables or by hand is tidious, obviously, I have tried to produce a datafram out of the results. So far my steps:

I made a function for the regression:

f1 <- function(X){summary(lm(X~HFres + age + sex + season + nonalcTE, data=dslin))}

I applied apply() to make a list (I only used a few of the 150 outcomes while trying to make it work)

m1 <- apply(dslin[,c(21:49)], MARGIN=2, FUN=f1)

Then I change the object into a dataframe:

m2 <- m1 %>% {tibble(variables = names(.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)

This is the result:

> m2
>A tibble: 29 x 9
>   variables   `(Intercept)`[,1]   [,2]  [,3]      [,4] HFres[,1]    [,2]    [,3]  [,4]
>   <chr>                   <dbl>  <dbl> <dbl>     <dbl>     <dbl>   <dbl>   <dbl> <dbl>
> 1 C_101_IL8                3.59 0.106   34.0 1.28e-224 0.0000129 0.00367 0.00352 0.997
> 2 C_102_VEGFA              9.28 0.0844 110.  0         0.00425   0.00293 1.45    0.147
> 3 C_103_AM                 4.92 0.0820  60.0 0         0.00261   0.00285 0.916   0.360
> 4 C_105_CD40L              7.53 0.164   45.9 0         0.00549   0.00570 0.964   0.335
> 5 C_106_GDF15              6.97 0.0864  80.7 0         0.00196   0.00300 0.653   0.514
> 6 C_107_PlGF               6.25 0.0665  94.0 0         0.00219   0.00231 0.947   0.344
> 7 C_108_SELE               4.89 0.117   41.8 1.14e-321 0.000978  0.00406 0.241   0.810
> 8 C_109_EGF                6.59 0.157   41.9 1.8 e-322 0.00714   0.00546 1.31    0.191
> 9 C_110_OPG                8.21 0.0673 122.  0         0.000320  0.00234 0.137   0.891
>10 C_111_SRC                7.62 0.0511 149.  0         0.000660  0.00177 0.372   0.710
>... with 19 more rows, and 6 more variables: age <dbl[,4]>, sexFemale <dbl[,4]>,
>  seasonfall <dbl[,4]>, seasonspring <dbl[,4]>, seasonsummer <dbl[,4]>,
>  nonalcTE <dbl[,4]>

It's a bit bad to see here but initially in m1 I had two columns, one with the variables and one with a list. Then after unnesting I have several columns which still each have 4 columns.

When I export this to excell (with the rio package) only the [,1] columns show up because the columns '(Intercept)', HF res, ecc. are still nested.

I have tried applying the unnest_wider() command again

m2 %>% unnest_wider(list=c('(Intercept)', 'HFres', 'age', 'sexFemale', 'seasonfall', 'seasonspring', 'seasonsummer')

This didn't work, because it didn't accept that I want to unnest a list of columns instead of a dataframe.

I then tried it for only one of the variables to start with

m2 %>% unnest_wider(HFres)

This also gave me errors.

So, my remaining problem is I still need to unnest the columns of m2 in order to make them all visible when I export them.

Alternatively, It would be enough for me to have only the [,1] and [,4] subcolumn of each column if that is easier to extract them. I know I can e.g. access one subcolumn like this: m2[["age"]][,1] and maybe I could make a new dataframe from m2 extracting all the columns I want?

Thank you for your help!

Update: reprex ( I hope this is a correct understanding of what a reprex is)

create dataframe

age <- c(34, 56, 24, 78, 56, 67, 45, 93, 62, 16) bmi <- c(24, 25, 27, 23, 2, 27, 28, 24, 27, 21) educ <- c(4,2,5,1,3,2,4,5,2,3) smoking <- c(1,3,2,2,3,2,1,3,2,1) HF <- c(3,4,2,4,5,3,2,3,5,2) P1 <- c(5,4,7,9,5,6,7,3,4,2) P2 <- c(7,2,4,6,5,3,2,5,6,3) P3 <- c(6,4,2,3,5,7,3,2,5,6)

df <- data.frame(age, bmi, educ, smoking, HF, P1, P2, P3)

function f1 <- function(X){summary(lm(X~HF + age + bmi + educ + smoking, data=df))}

apply function to columns m1 <- apply(df[,c(6:8)], MARGIN=2, FUN=f1)

m2 <- m1 %>% {tibble(variables = names(.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)

I basically need the coefficient (beta) which is the [,1] of each column and the p-value which is the [,4]

Upvotes: 2

Views: 1640

Answers (1)

zephryl
zephryl

Reputation: 17204

The broom package is intended for exactly this — turning model results into tidy dataframes. Here’s an example using broom::tidy() to get a table of coefficients for each dv, and purrr::map_dfr() to iterate over dvs, row-bind the coefficient tables, and add a column with the dv for each model:

library(broom)
library(purrr)

f1 <- function(X) {
  tidy(lm(
    as.formula(paste(X, "~ mpg * cyl")),
    data = mtcars
  ))
}

model_results <- map_dfr(
  set_names(names(mtcars)[3:11]), 
  f1,
  .id = "dv"
)

model_results

Output:

# A tibble: 36 x 6
   dv    term         estimate std.error statistic  p.value
   <chr> <chr>           <dbl>     <dbl>     <dbl>    <dbl>
 1 disp  (Intercept) -242.      154.        -1.57  0.128   
 2 disp  mpg           10.3       6.47       1.59  0.123   
 3 disp  cyl          103.       22.9        4.52  0.000104
 4 disp  mpg:cyl       -3.24      1.18      -2.75  0.0104  
 5 hp    (Intercept)  -86.5     123.        -0.704 0.487   
 6 hp    mpg            4.59      5.16       0.889 0.381   
 7 hp    cyl           50.3      18.2        2.75  0.0102  
 8 hp    mpg:cyl       -1.47      0.940     -1.57  0.128   
 9 drat  (Intercept)    3.34      1.28       2.61  0.0145  
10 drat  mpg            0.0541    0.0538     1.01  0.323   
# ... with 26 more rows

If you want dvs in rows and coefficients in columns, you can tidyr::pivot_wider():

library(tidyr)

model_coefs <- pivot_wider(
  model_results,
  id_cols = dv,
  names_from = term,
  values_from = estimate
)

model_coefs

Output:

# A tibble: 9 x 5
  dv    `(Intercept)`      mpg      cyl `mpg:cyl`
  <chr>         <dbl>    <dbl>    <dbl>     <dbl>
1 disp       -242.    10.3     103.     -3.24    
2 hp          -86.5    4.59     50.3    -1.47    
3 drat          3.34   0.0541   -0.0354 -0.00533 
4 wt            2.98  -0.00947   0.478  -0.0219  
5 qsec         25.0   -0.0938   -0.862   0.000318
6 vs            2.38  -0.0194   -0.292   0.00223 
7 am           -0.908  0.0702    0.0721 -0.00470 
8 gear          4.22   0.0115   -0.181   0.00311 
9 carb          3.32  -0.0830    0.249  -0.00333

Upvotes: 1

Related Questions