Reputation:
I have an input variable X
and I'm trying to extract the pairs of variables in X
with a high correlation (>0.9) between each other. So far, I'm using the cor
function to calculate the correlation between the variables, but I can't see a clear way to get a list/data frame of the pairs of variables that have a high correlation between each other.
Upvotes: 2
Views: 1470
Reputation: 226087
which(..., arr.ind=TRUE)
is the key.
Make up some data:
set.seed(101)
X <- matrix(rnorm(500), nrow=10,
dimnames=list(NULL, outer(LETTERS,1:2,paste0)[1:50]))
cc <- cor(X)
range(cc[cc<1])
shows values from -0.82 to 0.87; I'll select values with abs(cc)>0.8
; row(cc) < col(cc)
will select only values from the upper triangle.
w <- which(abs(cc)>0.8 & row(cc)<col(cc), arr.ind=TRUE)
## reconstruct names from positions
high_cor <- matrix(colnames(cc)[w],ncol=2)
high_cor
[,1] [,2]
[1,] "G1" "H1"
[2,] "F1" "N1"
[3,] "T1" "Z1"
[4,] "U1" "A2"
[5,] "Q1" "C2"
[6,] "M1" "O2"
Upvotes: 2