Reputation: 23
I have a correlation matrix in excel follows:
dfA <- read.table(text=
"beta1 beta2 beta3 beta4 beta5 beta6 X X2 X3
beta1 1.0000 -0.2515 -0.2157 0.7209 -0.7205 0.4679 0.1025 -0.3606 -0.0356
beta2 -0.2515 1.0000 0.9831 0.1629 -0.1654 -0.5595 -0.0316 0.0946 0.0829
beta3 -0.2157 0.9831 1.0000 0.1529 -0.1559 -0.4976 -0.0266 0.0383 0.0738
beta4 0.7209 0.1629 0.1529 1.0000 -1.0000 -0.2753 0.0837 -0.1445 0.0080
beta5 0.4679 -0.5595 -0.4976 -0.2753 1.0000 0.2757 0.0354 -0.3149 -0.0596
beta6 -0.7205 -0.1654 -0.1559 -1.0000 0.2757 1.0000 -0.0837 0.1451 -0.0081
X 0.1025 -0.0316 -0.0266 0.0837 -0.0837 0.0354 1.0000 0.0278 -0.0875
X2 -0.3606 0.0946 0.0383 -0.1445 0.1451 -0.3149 0.0278 1.0000 0.2047
X3 -0.0356 0.0829 0.0738 0.0080 -0.0081 -0.0596 -0.0875 0.2047 1.0000",
header=TRUE)
I have just the correlation matrix and not the original data from which the matrix is formed, so, I tried to read the this matrix into matrix in R with this code:
B <- as.matrix(dfA)
But when I try to form a scatter plot matrix with the following code:
library(corrplot)
corrplot(B, method="circle")
I receive error
Error in corrplot(B, method = "circle") : The matrix is not in [-1, 1]!
Kindly help me with this problem.
Upvotes: 2
Views: 3337
Reputation: 708
Update to my first post using ggplot based on user20650's comments above. user20650 shows that the likely source of error was rounding mistakes leading to some numbers being out of the permissible [-1,1] range and that rounding solves this issue. I was able to produce a plot using corrplot() as well.
At this point, running corrplot() yields the following plot:
corMat<-as.matrix(dfA)
library('corrplot')
corrplot(corMat, method='circle')
You can also do this in ggplot2 with a few additional steps. I personally think it looks much better.
1) I get rid of the redundant information in the lower triangle of the matrix.
corMat[lower.tri(corMat)]<-NA
> print(corMat)
beta1 beta2 beta3 beta4 beta5 beta6 X X2 X3
beta1 1 -0.2515 -0.2157 0.7209 0.4679 -0.7205 0.1025 -0.3606 -0.0356
beta2 NA 1.0000 0.9831 0.1629 -0.5595 -0.1654 -0.0316 0.0946 0.0829
beta3 NA NA 1.0000 0.1529 -0.4976 -0.1559 -0.0266 0.0383 0.0738
beta4 NA NA NA 1.0000 -0.2753 -1.0000 0.0837 -0.1445 0.0080
beta5 NA NA NA NA 1.0000 0.2757 -0.0837 0.1451 -0.0081
beta6 NA NA NA NA NA 1.0000 0.0354 -0.3149 -0.0596
X NA NA NA NA NA NA 1.0000 0.0278 -0.0875
X2 NA NA NA NA NA NA NA 1.0000 0.2047
X3 NA NA NA NA NA NA NA NA 1.0000
2) Then I use reshape2::melt() to transform the matrix into long form and create a formatted version of values that only show up to two decimal places. This will be useful for the plot.
library(reshape2)
m<-melt(corMat)
m<-data.frame(m[!is.na(m[,3]),]) # get rid of the NA matrix entries
m$value_lab<-sprintf('%.2f',m$value)
Here's what the data looks like:
> head(m)
Var1 Var2 value value_lab
1 beta1 beta1 1.0000 1.00
10 beta1 beta2 -0.2515 -0.25
11 beta2 beta2 1.0000 1.00
19 beta1 beta3 -0.2157 -0.22
20 beta2 beta3 0.9831 0.98
21 beta3 beta3 1.0000 1.00
3) Finally, I feed this data into ggplot2 - primarily relying on geom_tile() to print the matrix and geom_text() to print the labels over each tile. You can dress this up more if you want.
library(ggplot2)
ggplot(m, aes(Var2, Var1, fill = value, label=value_lab),color='blue') +
geom_tile() +
geom_text() +
xlab('')+
ylab('')+
theme_minimal()
Upvotes: 5