Reputation:
I am using findCorrelation
function in R:
highCorr <- findCorrelation(correlations, cutoff = .60,names = FALSE)
The function return columns numbers/names that are 0.6 an above correlated.
I want to remove these columns.
I don't know how to do this because first if i remove them one at a time the column number change but, I want to try few cutoff threshold and would like to do this automatically.
Upvotes: 1
Views: 738
Reputation: 704
If your original data are a correlation matrix you can do the following:
library(caret) #findCorrelation comes from this library
set.seed(1)
#create simulated data for correlation matrix
mydata <- matrix(data = rnorm(100,mean = 100, sd = 3), nrow = 10, ncol = 10)
#create correlation matrix
correlations <- cor(mydata)
#index correlations at cutoff
corr_ind <- findCorrelation(correlations, cutoff = .2)
#remove columns from original data based on index value
remove_corrs <- mydata[-c(corr_ind)]
Upvotes: 1