Maany Ramanan
Maany Ramanan

Reputation: 53

Correlation matrix with significance test custom function not working

I tried using the custom function (flattenCorrMatrix) mentioned in link below but its showing up with an error. I have never seen this kind of error before, and haven't worked with functions in R so hoping someone can help fix it. I am also finding that I have to assign the correlation values to a different variable to call it using corrplot. This comes up with an additional warning message. The warning is ok, since I know the plot is correct, but is there a way to get rid of the error and the warning? I have attached my code and data file used too. Thanks!

Link from where I got my code: http://www.sthda.com/english/wiki/correlation-matrix-a-quick-start-guide-to-analyze-format-and-visualize-a-correlation-matrix-using-r-software

Error for customer function: Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 36, 28

Warning message for corrplot: Warning message: In corrplot(cr3, type = "lower", p.mat = cr4$P, sig.level = 0.1, : p.mat and corr may be not paired, their rownames and colnames are not totally same!

My Code:

library(corrplot)
library(Hmisc)

#load data set including 2016-2021 data (without replicates from 2016-17)
crop.data4 <- read.csv("barleygt20.csv", header = TRUE)
variables<-crop.data4[,c(6:15)]

#calculate correlations
cr1<-cor(variables)
cr2=rcorr(as.matrix(variables))
colnames(cr1)<-c("Yield", "Protein", "Weight", "Plump", "Thin", "Onset GT", "Peak GT", "Offset GT", "Onset-Peak GT", "Peak-Offset GT")
rownames(cr1)<-c("Yield", "Protein", "Weight", "Plump", "Thin", "Onset GT", "Peak GT", "Offset GT", "Onset-Peak GT", "Peak-Offset GT")

flattenCorrMatrix <- function(cormat, pmat) 
{
  ut <- upper.tri(cormat)
  data.frame(
    row = rownames(cormat)[row(cormat)[ut]],
    column = rownames(cormat)[col(cormat)[ut]],
    cor  =(cormat)[ut],
    p = pmat[ut]
  )
}
flattenCorrMatrix(cr2$r, cr2$P)

# Correlation plot with insignificant correlations crossed out
corrplot(cr1, type="lower",p.mat = cr2$P, sig.level = 0.05, tl.col="black", method="color", title= "A", mar=c(0,0,1,0))

Data file used: https://drive.google.com/file/d/1gTgrtsrctEVevJeP-sgKdtjvT707eEcX/view?usp=sharing

Upvotes: 1

Views: 798

Answers (1)

akrun
akrun

Reputation: 887173

The error comes from the diagonal values in 'cr2$P' which are NA whereas the one in 'cr1' is 1. This creates an imbalance in the length if the missing values are removed. The warning occurs because the dimnames are slightly different.

> diag(cr1)
   yield  protein  onsetgt   peakgt offsetgt  donpeak doffpeak     mgwt    plump     thin 
       1        1        1        1        1        1        1        1        1        1 
> diag(cr2$P)
   yield  protein  onsetgt   peakgt offsetgt  donpeak doffpeak     mgwt    plump     thin 
      NA       NA       NA       NA       NA       NA       NA       NA       NA       NA 

> dimnames(cr1)
[[1]]
 [1] "Yield"          "Protein"        "Weight"         "Plump"          "Thin"           "Onset GT"       "Peak GT"        "Offset GT"     
 [9] "Onset-Peak GT"  "Peak-Offset GT"

[[2]]
 [1] "Yield"          "Protein"        "Weight"         "Plump"          "Thin"           "Onset GT"       "Peak GT"        "Offset GT"     
 [9] "Onset-Peak GT"  "Peak-Offset GT"

> dimnames(cr2$P)
[[1]]
 [1] "yield"    "protein"  "onsetgt"  "peakgt"   "offsetgt" "donpeak"  "doffpeak" "mgwt"     "plump"    "thin"    

[[2]]
 [1] "yield"    "protein"  "onsetgt"  "peakgt"   "offsetgt" "donpeak"  "doffpeak" "mgwt"     "plump"    "thin" 

One option is to set the diagonal to NA and the dimnames as the other one

diag(cr1) <- NA
dimnames(cr2$P) <- dimnames(cr1)

and now we apply the code

corrplot(cr1, type="lower",p.mat = cr2$P, sig.level = 0.05,
    tl.col="black", method="color", title= "A", mar=c(0,0,1,0))

Upvotes: 2

Related Questions