Reputation: 237

How to display only the top 10 strongest correlated variables with corrplot() in R?

I have > 100 variables and would like to understand how they are correlated with each other. I would like to do this using the corrplot() function from the corrplot package.

corrplot() offers the option to order the displayed variables so that the most strongly correlated variables get displayed in the top right of the corrplot. The parameter order="hclust" has to be set to achieve this:

library(corrplot)
corrplot(cor(df), order="hclust", type="upper") # df = data.frame object

Problem: The corrplot will contain all > 100 variables and is hence not readable. Therefore, I am looking for a way to display the top 10 strongest correlated variables in a corrplot, then the top 11-20 in another corrplot, etc. I am grateful for your tips and advice. Thanks a lot in advance.

Upvotes: 3

Answers (2)

AriadnaAgnis

Reputation: 21

Although I'm one year late, I'll leave this here in case someone else needs this simple and beautiful solution:

Install lares from GitHub

devtools::install_github("laresbernardo/lares")

Barchart with top correlations in dataset

library(lares) 
corr_cross(data_frame, # dataset
           max_pvalue = 0.05, # show only sig. correlations at selected level
           top = 10 # display top 10 correlations, any couples of variables  )

Barchart with top corellations focused on only one variable (happy)

corr_var(data_frame, # dataset
         happy, # name of variable to focus on
         top = 10 # display top 10 correlations )

Upvotes: 2

akrun

Reputation: 887213

We can create a grouping variable based on the correlation coefficient after arrangeing the correlation values in descending order and removing the duplicate elements

library(tidyverse)
n1 <- 10 
m1 <- cor(df)
out <- as.table(m1) %>%
        as_data_frame %>% 
        transmute(Var1N = pmin(Var1, Var2), Var2N = pmax(Var1, Var2), n) %>% 
        distinct %>% 
        filter(Var1N != Var2N) %>% 
        arrange(desc(n)) %>%
        group_by(grp = as.integer(gl(n(), n1, n())))

Based on the grouping variable, we can do the corrplot individually for each set of groups

posplt <- possibly(function(x) 
           corrplot(x, order = "hclust", type = "upper"), otherwise = NA)
pdf("corplt.pdf")
out[1:3] %>% 
      split(out$grp) %>% 
      map(~ xtabs(n ~ Var1N + Var2N, .x) %>% 
                 posplt) 
dev.off()

Upvotes: 1

How to display only the top 10 strongest correlated variables with corrplot() in R?

Answers (2)

Related Questions