Reputation: 4309
Using cor(mtcars, method='pearson')
produces a matrix showing the pearson's correlations for all variables in mtcars
vs all other variables in mtcars
. eg:
head(cor(mtcars, method='pearson'))
mpg cyl disp hp drat wt qsec vs am gear
mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.41868403 0.6640389 0.5998324 0.4802848
cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.6999381 0.7824958 -0.59124207 -0.8108118 -0.5226070 -0.4926866
disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.7102139 0.8879799 -0.43369788 -0.7104159 -0.5912270 -0.5555692
hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.4487591 0.6587479 -0.70822339 -0.7230967 -0.2432043 -0.1257043
drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.0000000 -0.7124406 0.09120476 0.4402785 0.7127111 0.6996101
wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.7124406 1.0000000 -0.17471588 -0.5549157 -0.6924953 -0.5832870
carb
mpg -0.5509251
cyl 0.5269883
disp 0.3949769
hp 0.7498125
drat -0.0907898
wt 0.4276059
How can I get the same matrix above, except instead of each value being a pearson's correlation between each variable, it is the r.squared
value from a linear model? So for example the first column, second row would be the same as summary(lm(mtcars$mpg~ mtcars$cyl))$r.squared
. Thank you
Upvotes: 0
Views: 1107
Reputation: 2289
I create a corlm function, which fills the entries with a for loop
corlm <- function(df){
mat <- matrix(NA, ncol(df), ncol(df), dimnames = list(colnames(df),colnames(df)))
suppressWarnings(for(i in 1:ncol(df)){
for(j in 1:ncol(df)){
mat[i,j] = summary(lm(df[,j] ~ df[,i]))$r.squared}})
diag(mat) = NA; return(mat)
}
round(corlm(mtcars),3)
mpg cyl disp hp drat wt qsec vs am gear carb
mpg NA 0.726 0.718 0.602 0.464 0.753 0.175 0.441 0.360 0.231 0.304
cyl 0.726 NA 0.814 0.693 0.490 0.612 0.350 0.657 0.273 0.243 0.278
disp 0.718 0.814 NA 0.626 0.504 0.789 0.188 0.505 0.350 0.309 0.156
hp 0.602 0.693 0.626 NA 0.201 0.434 0.502 0.523 0.059 0.016 0.562
drat 0.464 0.490 0.504 0.201 NA 0.508 0.008 0.194 0.508 0.489 0.008
wt 0.753 0.612 0.789 0.434 0.508 NA 0.031 0.308 0.480 0.340 0.183
qsec 0.175 0.350 0.188 0.502 0.008 0.031 NA 0.554 0.053 0.045 0.431
vs 0.441 0.657 0.505 0.523 0.194 0.308 0.554 NA 0.028 0.042 0.324
am 0.360 0.273 0.350 0.059 0.508 0.480 0.053 0.028 NA 0.631 0.003
gear 0.231 0.243 0.309 0.016 0.489 0.340 0.045 0.042 0.631 NA 0.075
carb 0.304 0.278 0.156 0.562 0.008 0.183 0.431 0.324 0.003 0.075 NA
Upvotes: 2
Reputation: 16121
library(tidyverse)
# kepp names of dataset
names = names(mtcars)
expand.grid(names, names, stringsAsFactors = F) %>% # create pairs of names
filter(Var1 != Var2) %>% # exclude same variables (creates warnings)
rowwise() %>% # for each row
mutate(r = summary(lm(paste(Var1, "~" ,Var2), data = mtcars))$r.squared) %>% # get the r squared
spread(Var2, r) # reshape
# # A tibble: 11 x 12
# Var1 am carb cyl disp drat gear hp mpg
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 am NA 0.00331 0.273 0.350 0.508 0.631 0.0591 0.360
# 2 carb 0.00331 NA 0.278 0.156 0.00824 0.0751 0.562 0.304
# 3 cyl 0.273 0.278 NA 0.814 0.490 0.243 0.693 0.726
# 4 disp 0.350 0.156 0.814 NA 0.504 0.309 0.626 0.718
# 5 drat 0.508 0.00824 0.490 0.504 NA 0.489 0.201 0.464
# 6 gear 0.631 0.0751 0.243 0.309 0.489 NA 0.0158 0.231
# 7 hp 0.0591 0.562 0.693 0.626 0.201 0.0158 NA 0.602
# 8 mpg 0.360 0.304 0.726 0.718 0.464 0.231 0.602 NA
# 9 qsec 0.0528 0.431 0.350 0.188 0.00832 0.0452 0.502 0.175
# 10 vs 0.0283 0.324 0.657 0.505 0.194 0.0424 0.523 0.441
# 11 wt 0.480 0.183 0.612 0.789 0.508 0.340 0.434 0.753
# # ... with 3 more variables: qsec <dbl>, vs <dbl>, wt <dbl>
If you want to have row names instead of the first column (Var1) you can add at the end of the pipeline above
... %>%
data.frame() %>%
column_to_rownames("Var1")
That would be closer to the output you have from cor(mtcars, method='pearson')
Upvotes: 4