Reputation: 481

Summary statistics for simple linear regressions stored in a list using lapply

I have a series of simple linear regressions of the form y ~ x1, y ~x2, y~ x3 etc.

I have been able to run all my linear regressions and have stored the output, but I am having trouble to access the summary statistics in a vectorized way e.g. adjusted R squared for each model.

I could do it via a for loop and iterate through each model, but I think there must be easier way to perhaps use lapply (or sapply?) and get the result quicker.

A reproducible example is below:

library(tidyverse)
library(broom)

set.seed(6)

DF <- data.frame(Y=rnorm(50, 100, 3),
                 X1=rnorm(50, 100, 3),
                 X2=rnorm(50, 100, 3),
                 X3=rnorm(50, 100, 3),
                 X4=rnorm(50, 100, 3))

DF_longer = pivot_longer(DF, -Y, names_to = "variable", values_to = "value", values_ptypes = list(val = 'numeric'))

lm1 = DF_longer %>% group_by(variable) %>% do(tidy(lm(Y ~ value, data=.)))
lm2 = DF_longer %>% group_by(variable) %>% do(mod = lm(Y ~ value, data=.))

The part that I would like to optimise is the following where I would like to store the adjusted R squared for each model in a vector without a for loop.

lm2_data = summary(lm2$mod[[1]])
lm2_data$adj.r.squared

lm2_data = summary(lm2$mod[[2]])
lm2_data$adj.r.squared

lm2_data = summary(lm2$mod[[3]])
lm2_data$adj.r.squared

lm2_data = summary(lm2$mod[[4]])
lm2_data$adj.r.squared

Upvotes: 1

Answers (3)

jay.sf

Reputation: 72813

In base R you may use reformulate to get a formula vector fov, over which you do the regressions and simultaneously extract the adj. R² in an sapply.

fov <- lapply(names(DF)[2:5], reformulate, "Y")
res <- sapply(fov, function(x) summary(lm(x, data=DF))$adj)
res
# [1] -0.007637371  0.007729944  0.049935424 -0.020262353

In a single step do:

sapply(lapply(names(DF)[2:5], reformulate, "Y"), function(x) summary(lm(x, data=DF))$adj)
# [1] -0.007637371  0.007729944  0.049935424 -0.020262353

Note: You stated you want a vector, if you want a list just replace the sapply with lapply.

Upvotes: 0

Len Greski

Reputation: 10855

Here's an example using the mtcars data:

regModels <- c("mpg ~ am", "mpg ~ am + wt", "mpg ~ wt + am + disp")

results <- lapply(regModels,function(x){
     y <- summary(lm(x,data = mtcars))$adj.r.squared

})

names(results) <- regModels

results

...and the output:

> results
$`mpg ~ am`
[1] 0.3384589

$`mpg ~ am + wt`
[1] 0.7357889

$`mpg ~ wt + am + disp`
[1] 0.757583

>

Using data from the original post...

library(tidyverse)
library(broom)
set.seed(6)

DF <- data.frame(Y=rnorm(50, 100, 3),
                 X1=rnorm(50, 100, 3),
                 X2=rnorm(50, 100, 3),
                 X3=rnorm(50, 100, 3),
                 X4=rnorm(50, 100, 3))

DF_longer = pivot_longer(DF, -Y, names_to = "variable", values_to = "value", values_ptypes = list(val = 'numeric'))

lm1 = DF_longer %>% group_by(variable) %>% do(tidy(lm(Y ~ value, data=.)))
lm2 = DF_longer %>% group_by(variable) %>% do(mod = lm(Y ~ value, data=.))

adjRsquared <- lapply(lm2$mod,function(x){
     y <- summary(x)$adj.r.squared
})
names(adjRsquared) <- lm2$variable
adjRsquared

...and the output:

> adjRsquared
$X1
[1] -0.007637371

$X2
[1] 0.007729944

$X3
[1] 0.04993542

$X4
[1] -0.02026235

Upvotes: 1

Thomas Rosa

Reputation: 732

The sapply() function executes a function on each element of a list and gives you a vector:

sapply(lm2$mod, function(x) summary(x)$adj.r.squared)

Upvotes: 0

Summary statistics for simple linear regressions stored in a list using lapply

Answers (3)

Using data from the original post...

Related Questions