Reputation: 21
I am running a multivariate regression on ~150 different outcomes. Because gathering the results by individual tables or by hand is tidious, obviously, I have tried to produce a datafram out of the results. So far my steps:
I made a function for the regression:
f1 <- function(X){summary(lm(X~HFres + age + sex + season + nonalcTE, data=dslin))}
I applied apply()
to make a list (I only used a few of the 150 outcomes while trying to make it work)
m1 <- apply(dslin[,c(21:49)], MARGIN=2, FUN=f1)
Then I change the object into a dataframe:
m2 <- m1 %>% {tibble(variables = names(.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)
This is the result:
> m2
>A tibble: 29 x 9
> variables `(Intercept)`[,1] [,2] [,3] [,4] HFres[,1] [,2] [,3] [,4]
> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
> 1 C_101_IL8 3.59 0.106 34.0 1.28e-224 0.0000129 0.00367 0.00352 0.997
> 2 C_102_VEGFA 9.28 0.0844 110. 0 0.00425 0.00293 1.45 0.147
> 3 C_103_AM 4.92 0.0820 60.0 0 0.00261 0.00285 0.916 0.360
> 4 C_105_CD40L 7.53 0.164 45.9 0 0.00549 0.00570 0.964 0.335
> 5 C_106_GDF15 6.97 0.0864 80.7 0 0.00196 0.00300 0.653 0.514
> 6 C_107_PlGF 6.25 0.0665 94.0 0 0.00219 0.00231 0.947 0.344
> 7 C_108_SELE 4.89 0.117 41.8 1.14e-321 0.000978 0.00406 0.241 0.810
> 8 C_109_EGF 6.59 0.157 41.9 1.8 e-322 0.00714 0.00546 1.31 0.191
> 9 C_110_OPG 8.21 0.0673 122. 0 0.000320 0.00234 0.137 0.891
>10 C_111_SRC 7.62 0.0511 149. 0 0.000660 0.00177 0.372 0.710
>... with 19 more rows, and 6 more variables: age <dbl[,4]>, sexFemale <dbl[,4]>,
> seasonfall <dbl[,4]>, seasonspring <dbl[,4]>, seasonsummer <dbl[,4]>,
> nonalcTE <dbl[,4]>
It's a bit bad to see here but initially in m1 I had two columns, one with the variables and one with a list. Then after unnesting I have several columns which still each have 4 columns.
When I export this to excell (with the rio
package) only the [,1]
columns show up because the columns '(Intercept)'
, HF res
, ecc. are still nested.
I have tried applying the unnest_wider()
command again
m2 %>% unnest_wider(list=c('(Intercept)', 'HFres', 'age', 'sexFemale', 'seasonfall', 'seasonspring', 'seasonsummer')
This didn't work, because it didn't accept that I want to unnest a list of columns instead of a dataframe.
I then tried it for only one of the variables to start with
m2 %>% unnest_wider(HFres)
This also gave me errors.
So, my remaining problem is I still need to unnest the columns of m2 in order to make them all visible when I export them.
Alternatively, It would be enough for me to have only the [,1]
and [,4]
subcolumn of each column if that is easier to extract them. I know I can e.g. access one subcolumn like this: m2[["age"]][,1]
and maybe I could make a new dataframe from m2 extracting all the columns I want?
Thank you for your help!
Update: reprex ( I hope this is a correct understanding of what a reprex is)
create dataframe
age <- c(34, 56, 24, 78, 56, 67, 45, 93, 62, 16)
bmi <- c(24, 25, 27, 23, 2, 27, 28, 24, 27, 21)
educ <- c(4,2,5,1,3,2,4,5,2,3)
smoking <- c(1,3,2,2,3,2,1,3,2,1)
HF <- c(3,4,2,4,5,3,2,3,5,2)
P1 <- c(5,4,7,9,5,6,7,3,4,2)
P2 <- c(7,2,4,6,5,3,2,5,6,3)
P3 <- c(6,4,2,3,5,7,3,2,5,6)
df <- data.frame(age, bmi, educ, smoking, HF, P1, P2, P3)
function
f1 <- function(X){summary(lm(X~HF + age + bmi + educ + smoking, data=df))}
apply function to columns
m1 <- apply(df[,c(6:8)], MARGIN=2, FUN=f1)
m2 <- m1 %>% {tibble(variables = names(.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)
I basically need the coefficient
(beta) which is the [,1]
of each column and the p-value which is the [,4]
Upvotes: 2
Views: 1640
Reputation: 17204
The broom package is intended for exactly this — turning model results into tidy dataframes. Here’s an example using broom::tidy()
to get a table of coefficients for each dv, and purrr::map_dfr()
to iterate over dvs, row-bind the coefficient tables, and add a column with the dv
for each model:
library(broom)
library(purrr)
f1 <- function(X) {
tidy(lm(
as.formula(paste(X, "~ mpg * cyl")),
data = mtcars
))
}
model_results <- map_dfr(
set_names(names(mtcars)[3:11]),
f1,
.id = "dv"
)
model_results
Output:
# A tibble: 36 x 6
dv term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 disp (Intercept) -242. 154. -1.57 0.128
2 disp mpg 10.3 6.47 1.59 0.123
3 disp cyl 103. 22.9 4.52 0.000104
4 disp mpg:cyl -3.24 1.18 -2.75 0.0104
5 hp (Intercept) -86.5 123. -0.704 0.487
6 hp mpg 4.59 5.16 0.889 0.381
7 hp cyl 50.3 18.2 2.75 0.0102
8 hp mpg:cyl -1.47 0.940 -1.57 0.128
9 drat (Intercept) 3.34 1.28 2.61 0.0145
10 drat mpg 0.0541 0.0538 1.01 0.323
# ... with 26 more rows
If you want dvs in rows and coefficients in columns, you can tidyr::pivot_wider()
:
library(tidyr)
model_coefs <- pivot_wider(
model_results,
id_cols = dv,
names_from = term,
values_from = estimate
)
model_coefs
Output:
# A tibble: 9 x 5
dv `(Intercept)` mpg cyl `mpg:cyl`
<chr> <dbl> <dbl> <dbl> <dbl>
1 disp -242. 10.3 103. -3.24
2 hp -86.5 4.59 50.3 -1.47
3 drat 3.34 0.0541 -0.0354 -0.00533
4 wt 2.98 -0.00947 0.478 -0.0219
5 qsec 25.0 -0.0938 -0.862 0.000318
6 vs 2.38 -0.0194 -0.292 0.00223
7 am -0.908 0.0702 0.0721 -0.00470
8 gear 4.22 0.0115 -0.181 0.00311
9 carb 3.32 -0.0830 0.249 -0.00333
Upvotes: 1