Reputation: 237
I have > 100 variables and would like to understand how they are correlated with each other. I would like to do this using the corrplot()
function from the corrplot
package.
corrplot()
offers the option to order the displayed variables so that the most strongly correlated variables get displayed in the top right of the corrplot. The parameter order="hclust"
has to be set to achieve this:
library(corrplot)
corrplot(cor(df), order="hclust", type="upper") # df = data.frame object
Problem: The corrplot
will contain all > 100 variables and is hence not readable. Therefore, I am looking for a way to display the top 10 strongest correlated variables in a corrplot
, then the top 11-20 in another corrplot
, etc. I am grateful for your tips and advice. Thanks a lot in advance.
Upvotes: 3
Views: 3839
Reputation: 21
Although I'm one year late, I'll leave this here in case someone else needs this simple and beautiful solution:
Install lares from GitHub
devtools::install_github("laresbernardo/lares")
Barchart with top correlations in dataset
library(lares)
corr_cross(data_frame, # dataset
max_pvalue = 0.05, # show only sig. correlations at selected level
top = 10 # display top 10 correlations, any couples of variables )
Barchart with top corellations focused on only one variable (happy)
corr_var(data_frame, # dataset
happy, # name of variable to focus on
top = 10 # display top 10 correlations )
Upvotes: 2
Reputation: 887213
We can create a grouping variable based on the correlation coefficient after arrange
ing the correlation values in descending order and removing the duplicate elements
library(tidyverse)
n1 <- 10
m1 <- cor(df)
out <- as.table(m1) %>%
as_data_frame %>%
transmute(Var1N = pmin(Var1, Var2), Var2N = pmax(Var1, Var2), n) %>%
distinct %>%
filter(Var1N != Var2N) %>%
arrange(desc(n)) %>%
group_by(grp = as.integer(gl(n(), n1, n())))
Based on the grouping variable, we can do the corrplot
individually for each set of groups
posplt <- possibly(function(x)
corrplot(x, order = "hclust", type = "upper"), otherwise = NA)
pdf("corplt.pdf")
out[1:3] %>%
split(out$grp) %>%
map(~ xtabs(n ~ Var1N + Var2N, .x) %>%
posplt)
dev.off()
Upvotes: 1