GeekCat
GeekCat

Reputation: 409

Linear models in R for all variables in a data.frame, return a matrix of results

all

I am trying to fit a linear models for several variables and report all R-squared values.

However, I would like to ask is there a way of doing this in one go rather than doing it in pair ?

for example, I know how to do it with 2 variables as:

data(mtcats)
fit<-lm(formula = mtcars[,1] ~ mtcars[,2])
summary(fit)$r.squared

mtcars has 11 numeric variables, is there a way of dong it for all variables ? I mean, since there are 11 variables, we want to record all r-squared values? We want a 11 by 11 matrix which is symmetric and diagonal of 0s ?

Upvotes: 1

Views: 1892

Answers (2)

eipi10
eipi10

Reputation: 93851

Because these are single-variable regression models, the r-squared is just the square of the correlation coefficient between each pair of variables, so you can do this:

rsq = cor(mtcars)^2
diag(rsq) = 0  # To get zeros on the diagonals

Here are the first 3 rows and columns:

> rsq[1:3, 1:3]
           mpg       cyl      disp
mpg  0.0000000 0.7261800 0.7183433
cyl  0.7261800 0.0000000 0.8136633
disp 0.7183433 0.8136633 0.0000000

By the way, you might find the corrplot package useful for visualizing the r-squared values. The package is really intended for correlations, rather than the squares of the correlations, but it's an easy way to quickly get an idea of which pairs of variables have the strongest relationships. You can use a more general heatmap as well, but corrplot provides some more focused tools for correlations.

library(corrplot)

corrplot.mixed(cor(mtcars)^2) 

# Or, to sort the column order by clustering
corrplot.mixed(cor(mtcars)^2, order="hclust")

See the vignette for more info.

Upvotes: 2

akrun
akrun

Reputation: 887501

You can use outer

 res1 <- outer(colnames(mtcars), colnames(mtcars), FUN= function(x,y) {
          sapply(as.list(paste(x,y, sep="~")), function(z) {
               form1 <- as.formula(z)
               fit <- lm(form1, data=mtcars)
               summary(fit)$r.squared})
               })

or expand.grid

indx <- expand.grid(colnames(mtcars), colnames(mtcars), stringsAsFactors=FALSE)
res2 <- sapply(seq_len(nrow(indx)),function(i) {i1 <- indx[i,]
                       form1 <-as.formula(paste(i1[,1], i1[,2], sep="~"))
                       fit <- lm(formula=form1, data=mtcars)
                       summary(fit)$r.squared})

 dim(res2) <- c(11,11)
 res2[1:3,1:3]
 #         [,1]      [,2]      [,3]
 #[1,] 0.0000000 0.7261800 0.7183433
 #[2,] 0.7261800 0.0000000 0.8136633
 #[3,] 0.7183433 0.8136633 0.0000000

 identical(res1,res2)
 #[1] TRUE

Upvotes: 2

Related Questions