user15077915
user15077915

Reputation:

Extract pairs of variables with high correlation

I have an input variable X and I'm trying to extract the pairs of variables in X with a high correlation (>0.9) between each other. So far, I'm using the cor function to calculate the correlation between the variables, but I can't see a clear way to get a list/data frame of the pairs of variables that have a high correlation between each other.

Upvotes: 2

Views: 1470

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226087

which(..., arr.ind=TRUE) is the key.

Make up some data:

set.seed(101)
X <- matrix(rnorm(500), nrow=10,
        dimnames=list(NULL, outer(LETTERS,1:2,paste0)[1:50]))
cc <- cor(X)

range(cc[cc<1]) shows values from -0.82 to 0.87; I'll select values with abs(cc)>0.8; row(cc) < col(cc) will select only values from the upper triangle.

w <- which(abs(cc)>0.8 & row(cc)<col(cc), arr.ind=TRUE)
## reconstruct names from positions
high_cor <- matrix(colnames(cc)[w],ncol=2)
high_cor
     [,1] [,2]
[1,] "G1" "H1"
[2,] "F1" "N1"
[3,] "T1" "Z1"
[4,] "U1" "A2"
[5,] "Q1" "C2"
[6,] "M1" "O2"

Upvotes: 2

Related Questions