Finding out the top 10 corr features in a data in R

I have a very large data set. I need to find out what variables have the highest percentage of correlations in the data set.

My code is below which shows all my correlations however I have 69 columns so it is impossible to check all of it (ot literally but im sure you can understand what I mean).

The code I am using is below:

File: CW_ModelDevelopment Select only Numericals

CW_ModelDevelopment %>% 
  # Select numeric columns
  select_if(is.numeric) %>% 
  # Calculate correlation matrix
  cor()

Pease can someone help me with getting a code that can show the results in percentages or set a condition where anything that corrolates over x amount should print Example of coding in Python of what I want below:

Upvotes: 0

Answers (2)

Ronak Shah

Reputation: 389047

You can try :

mat <- cor(mtcars) > 0.9
diag(mat) <- FALSE
colnames(mtcars)[which(mat, arr.ind = TRUE)[, 2]]
#[1] "cyl"  "disp"

Upvotes: -1

Till

Reputation: 6628

If we handle the cor() output as an adjacency matrix for a network object, igraph can help us to transform the output to a data.frame structure, where each variable combination has its own row. Using dplyr::top_n() we can then see the top ten results and its values.

library(tidyverse)
library(igraph, warn.conflicts = FALSE)
matrix(sample(1:10, 1000, replace = TRUE), 20 , 50) %>%
  as_tibble(.name_repair = "universal") %>%
  cor() %>%
  igraph::graph_from_adjacency_matrix(weighted = TRUE,
                                      diag = FALSE) %>%
  igraph::as_data_frame() %>%
  top_n(10)
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> * `` -> ...3
#> * `` -> ...4
#> * `` -> ...5
#> * ...
#> Selecting by weight
#>     from    to    weight
#> 1   ...3 ...34 0.7098358
#> 2   ...5 ...24 0.6054965
#> 3   ...9 ...16 0.6129791
#> 4  ...16  ...9 0.6129791
#> 5  ...21 ...38 0.6092931
#> 6  ...24  ...5 0.6054965
#> 7  ...33 ...42 0.6226324
#> 8  ...34  ...3 0.7098358
#> 9  ...38 ...21 0.6092931
#> 10 ...42 ...33 0.6226324

Upvotes: 2

Finding out the top 10 corr features in a data in R

Answers (2)

Related Questions