Reputation: 361
I'm currently performing multiple linear regression analyses across a range of dependent variables (almost 200) and would like to create a function that runs this for a specified set of columns, then extracts relelvant model estimates, e.g. Beta-coefficients and p-values.
Simulated data:
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011),
age = as.numeric(c('56', '43','59','74','61','62','69','80','40','55','58')),
sex = as.numeric(c('0','1','0','0','1','1','0','1','0','1','0')),
testscore_1 = as.numeric(c('23','28','30','15','7','18','29','27','14','22','24')),
testscore_2 = as.numeric(c('1','3','2','5','8','2','5','6','7','8','2')),
testscore_3 = as.numeric(c('18','20','19','15','20','23','19','25','10','14','12')),
education = as.numeric(c('5','4','3','5','2', '1','4','4','3','5','2')))
Which looks like:
ID age sex testscore_1 testscore_2 testscore_3 education
1 1001 56 0 23 1 18 5
2 1002 43 1 28 3 20 4
3 1003 59 0 30 2 19 3
4 1004 74 0 15 5 15 5
5 1005 61 1 7 8 20 2
6 1006 62 1 18 2 23 1
7 1007 69 0 29 5 19 4
8 1008 80 1 27 6 25 4
9 1009 40 0 14 7 10 3
10 1010 55 1 22 8 14 5
11 1011 58 0 24 2 12 2
I'm at a stage where I have a function that works:
lm_results <- lapply(df[,4:6], function(x) lm(x ~ age + sex + education,
data = df))
and I can derive coefficient estimates from this:
Coefficient <- data.frame(coefficients = sapply(lm_results, getElement, name = "coefficients"))
Which returns the coefficient for each predictor variable across each of the testscore_* variables, although I haven't been able to derive p-values from these models:
P_values <- data.frame(p.values = sapply(lm_results, getElement, name = "qr"))
Does anyone have any suggestions for resolving this?
Upvotes: 1
Views: 126
Reputation: 26353
This can actually be done quite elegantly using cbind
and broom::tidy
lm_results <- lm(cbind(testscore_1, testscore_2, testscore_3) ~ age + sex + education, data = df)
broom::tidy(lm_results)
# A tibble: 12 x 6
# response term estimate std.error statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 testscore_1 (Intercept) 14.9 14.5 1.03 0.339
# 2 testscore_1 age 0.0404 0.222 0.182 0.860
# 3 testscore_1 sex -1.47 5.09 -0.289 0.781
# 4 testscore_1 education 1.42 1.96 0.725 0.492
# 5 testscore_2 (Intercept) 1.83 4.93 0.371 0.721
# 6 testscore_2 age 0.00423 0.0752 0.0562 0.957
# 7 testscore_2 sex 1.93 1.73 1.12 0.301
# 8 testscore_2 education 0.432 0.664 0.651 0.536
# 9 testscore_3 (Intercept) 5.43 6.34 0.857 0.420
#10 testscore_3 age 0.192 0.0969 1.98 0.0882
#11 testscore_3 sex 4.57 2.23 2.05 0.0794
#12 testscore_3 education -0.359 0.856 -0.420 0.687
From ?lm
If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix.
Since you deal with many more variables, try
y <- as.matrix(df[startsWith(names(df), "testscore")])
lm_results <- lm(y ~ age + sex + education, data = df)
Assuming that the names of all your dependent variables start with "testscore".
Upvotes: 3
Reputation: 4873
Similar to @markus's answer, using broom
package but through piping.
require(tidyverse)
require(broom)
df %>%
gather(var, value, -ID, -age, -sex, -education) %>%
nest(-var) %>%
mutate(model = purrr::map(data, function(x) {
lm(value ~ age + sex + education, data = x)}),
values = purrr::map(model, tidy)) %>%
select(-data)%>%
unnest(values)
var term estimate std.error statistic p.value
1 testscore_1 (Intercept) 14.899383690 14.50707597 1.02704251 0.33857568
2 testscore_1 age 0.040404308 0.22161068 0.18232112 0.86049842
3 testscore_1 sex -1.472779643 5.09169384 -0.28925141 0.78076814
4 testscore_1 education 1.419080194 1.95702802 0.72512002 0.49190076
5 testscore_2 (Intercept) 1.829852912 4.92563999 0.37149546 0.72125796
6 testscore_2 age 0.004230513 0.07524428 0.05622371 0.95673475
7 testscore_2 sex 1.931496405 1.72880123 1.11724608 0.30076331
8 testscore_2 education 0.432491820 0.66447680 0.65087572 0.53589927
9 testscore_3 (Intercept) 5.434355575 6.34277671 0.85677864 0.41992820
10 testscore_3 age 0.191860896 0.09689251 1.98014159 0.08816340
11 testscore_3 sex 4.565962111 2.22618791 2.05102278 0.07941042
12 testscore_3 education -0.359482384 0.85565084 -0.42012743 0.68698792
Upvotes: 1