Pablo
Pablo

Reputation: 41

How can I do 3064 regressions using the lapply function

Hi i am starting to use r and am stuck on analyzing my data. I have a dataframe that has 157 columns. Column 1 is the dependent variable and from column 2 to 157 they are the independent variables, but from column 2 to column 79 it is a type of independent variable (n = 78) and from 80 to 157 another type (n = 78). I want to perform (78 x 78 = 6084) multiple linear regressions leaving the first independent variable of the model fixed one at a time, from columns 2 to 79. I can fix the independent variable and do the regressions separately like this

lm(Grassland$column1 ~ Grassland$column2 +  x)
lm(Grassland$column1 ~ Grassland$column3 +  x)

lm(Grassland$column1 ~ Grassland$column79 +  x)

My question is how can I do the 3064 regressions, writing a single code and extracting only the regressions whose p-value <0.05, eliminating the non-significant regressions?

Here is my code

library(data.table)

Regressions <- 
data.table(Grassland)[, 
                      .(Lm = lapply(.SD, function(x) summary(lm(Grassland$column1 ~ Grassland$column2 + x)))), .SDcols = 80:157]

Regressions[, lapply(Lm, function(x) coef(x)[, "Pr(>|t|)"])] [2:3] < 0.05       

Upvotes: 1

Views: 434

Answers (2)

akrun
akrun

Reputation: 887571

We can also use reformulate to create a formula and then apply the lm

lapply(setdiff(names(mtcars), "mpg"), function(x) 
        lm(reformulate(x, "mpg"), data = mtcars))

Upvotes: 0

r2evans
r2evans

Reputation: 160637

One, data.table isn't necessarily going to help you here, it works fine in an external lapply. First we generate the formulas programmatically (here I'll use most of mtcars), then we apply the formula onto the data.

paste("mpg ~", setdiff(names(mtcars), "mpg"))
#  [1] "mpg ~ cyl"  "mpg ~ disp" "mpg ~ hp"   "mpg ~ drat" "mpg ~ wt"   "mpg ~ qsec" "mpg ~ vs"  
#  [8] "mpg ~ am"   "mpg ~ gear" "mpg ~ carb"

regressions <- lapply(paste("mpg ~", setdiff(names(mtcars), "mpg")),
                      function(frm) lm(as.formula(frm), data=mtcars))

regressions[1:2]
# [[1]]
# Call:
# lm(formula = as.formula(frm), data = mtcars)
# Coefficients:
# (Intercept)          cyl  
#      37.885       -2.876  
# [[2]]
# Call:
# lm(formula = as.formula(frm), data = mtcars)
# Coefficients:
# (Intercept)         disp  
#    29.59985     -0.04122  

Upvotes: 0

Related Questions