niceguy
niceguy

Reputation: 157

filtering for high correlation using cor()


Hello.
I came across this post that answers the first part of my question: Filter correlation matrix R, the second reply with this code in particular:

index <- which(x > .80 & x < 1, # your criteria
               arr.ind = T) # the result of the which function is now in rows & columns
df = cbind.data.frame(stock1 = rownames(x)[index[,1]], # get the row name 
                 stock2 = colnames(x)[index[,2]]) # get the column name

However, this method would also include the same pairs twice. ie. SPY/QQQ, QQQ/SPY
How can I remove pairs with the same tickers? Thanks.

Upvotes: 0

Views: 873

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389047

You can turn upper or lower triangular correlation matrix values to NA so only one combination is considered.

x[upper.tri(x, diag = TRUE)] <- NA
index <- which(x > .80, arr.ind = T) 

df <- cbind.data.frame(stock1 = rownames(x)[index[,1]], 
                       stock2 = colnames(x)[index[,2]])

Upvotes: 1

Related Questions