Reputation: 41
Hi i am starting to use r and am stuck on analyzing my data. I have a dataframe that has 157 columns. Column 1 is the dependent variable and from column 2 to 157 they are the independent variables, but from column 2 to column 79 it is a type of independent variable (n = 78) and from 80 to 157 another type (n = 78). I want to perform (78 x 78 = 6084) multiple linear regressions leaving the first independent variable of the model fixed one at a time, from columns 2 to 79. I can fix the independent variable and do the regressions separately like this
lm(Grassland$column1 ~ Grassland$column2 + x)
lm(Grassland$column1 ~ Grassland$column3 + x)
lm(Grassland$column1 ~ Grassland$column79 + x)
My question is how can I do the 3064 regressions, writing a single code and extracting only the regressions whose p-value <0.05, eliminating the non-significant regressions?
Here is my code
library(data.table)
Regressions <-
data.table(Grassland)[,
.(Lm = lapply(.SD, function(x) summary(lm(Grassland$column1 ~ Grassland$column2 + x)))), .SDcols = 80:157]
Regressions[, lapply(Lm, function(x) coef(x)[, "Pr(>|t|)"])] [2:3] < 0.05
Upvotes: 1
Views: 434
Reputation: 887571
We can also use reformulate
to create a formula and then apply the lm
lapply(setdiff(names(mtcars), "mpg"), function(x)
lm(reformulate(x, "mpg"), data = mtcars))
Upvotes: 0
Reputation: 160637
One, data.table
isn't necessarily going to help you here, it works fine in an external lapply
. First we generate the formulas programmatically (here I'll use most of mtcars
), then we apply the formula onto the data.
paste("mpg ~", setdiff(names(mtcars), "mpg"))
# [1] "mpg ~ cyl" "mpg ~ disp" "mpg ~ hp" "mpg ~ drat" "mpg ~ wt" "mpg ~ qsec" "mpg ~ vs"
# [8] "mpg ~ am" "mpg ~ gear" "mpg ~ carb"
regressions <- lapply(paste("mpg ~", setdiff(names(mtcars), "mpg")),
function(frm) lm(as.formula(frm), data=mtcars))
regressions[1:2]
# [[1]]
# Call:
# lm(formula = as.formula(frm), data = mtcars)
# Coefficients:
# (Intercept) cyl
# 37.885 -2.876
# [[2]]
# Call:
# lm(formula = as.formula(frm), data = mtcars)
# Coefficients:
# (Intercept) disp
# 29.59985 -0.04122
Upvotes: 0