Happy Camper
Happy Camper

Reputation: 23

How to read a Correlation matrix and form a Scatterplot matrix in R

I have a correlation matrix in excel follows:

dfA <- read.table(text=
      "beta1   beta2   beta3   beta4   beta5   beta6       X      X2      X3
beta1  1.0000 -0.2515 -0.2157  0.7209 -0.7205  0.4679  0.1025 -0.3606 -0.0356
beta2 -0.2515  1.0000  0.9831  0.1629 -0.1654 -0.5595 -0.0316  0.0946  0.0829
beta3 -0.2157  0.9831  1.0000  0.1529 -0.1559 -0.4976 -0.0266  0.0383  0.0738
beta4  0.7209  0.1629  0.1529  1.0000 -1.0000 -0.2753  0.0837 -0.1445  0.0080
beta5  0.4679 -0.5595 -0.4976 -0.2753  1.0000  0.2757  0.0354 -0.3149 -0.0596
beta6 -0.7205 -0.1654 -0.1559 -1.0000  0.2757  1.0000 -0.0837  0.1451 -0.0081
X      0.1025 -0.0316 -0.0266  0.0837 -0.0837  0.0354  1.0000  0.0278 -0.0875
X2    -0.3606  0.0946  0.0383 -0.1445  0.1451 -0.3149  0.0278  1.0000  0.2047
X3    -0.0356  0.0829  0.0738  0.0080 -0.0081 -0.0596 -0.0875  0.2047  1.0000", 
      header=TRUE) 

I have just the correlation matrix and not the original data from which the matrix is formed, so, I tried to read the this matrix into matrix in R with this code:

 B <- as.matrix(dfA)

But when I try to form a scatter plot matrix with the following code:

library(corrplot)
corrplot(B, method="circle")

I receive error

Error in corrplot(B, method = "circle") : The matrix is not in [-1, 1]!

Kindly help me with this problem.

Upvotes: 2

Views: 3337

Answers (1)

AOGSTA
AOGSTA

Reputation: 708

corrplot() Solution

Update to my first post using ggplot based on user20650's comments above. user20650 shows that the likely source of error was rounding mistakes leading to some numbers being out of the permissible [-1,1] range and that rounding solves this issue. I was able to produce a plot using corrplot() as well.

At this point, running corrplot() yields the following plot:

corMat<-as.matrix(dfA)

library('corrplot')
corrplot(corMat, method='circle')

enter image description here

ggplot() Solution

You can also do this in ggplot2 with a few additional steps. I personally think it looks much better.

1) I get rid of the redundant information in the lower triangle of the matrix.

corMat[lower.tri(corMat)]<-NA

> print(corMat)
      beta1   beta2   beta3  beta4   beta5   beta6       X      X2      X3
beta1     1 -0.2515 -0.2157 0.7209  0.4679 -0.7205  0.1025 -0.3606 -0.0356
beta2    NA  1.0000  0.9831 0.1629 -0.5595 -0.1654 -0.0316  0.0946  0.0829
beta3    NA      NA  1.0000 0.1529 -0.4976 -0.1559 -0.0266  0.0383  0.0738
beta4    NA      NA      NA 1.0000 -0.2753 -1.0000  0.0837 -0.1445  0.0080
beta5    NA      NA      NA     NA  1.0000  0.2757 -0.0837  0.1451 -0.0081
beta6    NA      NA      NA     NA      NA  1.0000  0.0354 -0.3149 -0.0596
X        NA      NA      NA     NA      NA      NA  1.0000  0.0278 -0.0875
X2       NA      NA      NA     NA      NA      NA      NA  1.0000  0.2047
X3       NA      NA      NA     NA      NA      NA      NA      NA  1.0000

2) Then I use reshape2::melt() to transform the matrix into long form and create a formatted version of values that only show up to two decimal places. This will be useful for the plot.

library(reshape2)
m<-melt(corMat)
m<-data.frame(m[!is.na(m[,3]),]) # get rid of the NA matrix entries
m$value_lab<-sprintf('%.2f',m$value)

Here's what the data looks like:

> head(m)
    Var1  Var2   value value_lab
1  beta1 beta1  1.0000      1.00
10 beta1 beta2 -0.2515     -0.25
11 beta2 beta2  1.0000      1.00
19 beta1 beta3 -0.2157     -0.22
20 beta2 beta3  0.9831      0.98
21 beta3 beta3  1.0000      1.00

3) Finally, I feed this data into ggplot2 - primarily relying on geom_tile() to print the matrix and geom_text() to print the labels over each tile. You can dress this up more if you want.

library(ggplot2)
ggplot(m, aes(Var2, Var1, fill = value, label=value_lab),color='blue') + 
  geom_tile() + 
  geom_text() +
  xlab('')+
  ylab('')+
  theme_minimal()

enter image description here

Upvotes: 5

Related Questions