user3789396
user3789396

Reputation: 112

Match information from a correlation matrix according to their p-value cut off

I have used rcorr function of Hmisc library for calculation of correlations and p-values. Then extracted pvalues to Pval matrix and correlation coefficients to corr matrix.

Rvalue<-structure(c(1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 
0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 
1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 
1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1), .Dim = c(10L, 
10L), .Dimnames = list(c("41699", "41700", "41701", "41702", 
"41703", "41704", "41705", "41707", "41708", "41709"), c("41699", 
"41700", "41701", "41702", "41703", "41704", "41705", "41707", 
"41708", "41709")))

> Pvalue<-structure(c(NA, 0, 0, 0, 0.0258814351024321, 0, 0, 0, 0, 0, 0, 
NA, 6.70574706873595e-14, 0, 0, 2.1673942640632e-09, 1.08217552696743e-07, 
0.0105345133269157, 0, 0, 0, 6.70574706873595e-14, NA, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, NA, 2.22044604925031e-15, 0, 0, 0, 0, 
0, 0.0258814351024321, 0, 0, 2.22044604925031e-15, NA, 0, 0, 
0, 0.000322310440723728, 0.00298460759118657, 0, 2.1673942640632e-09, 
0, 0, 0, NA, 0, 0, 0, 0, 0, 1.08217552696743e-07, 0, 0, 0, 0, 
NA, 0, 0, 0, 0, 0.0105345133269157, 0, 0, 0, 0, 0, NA, 0, 0, 
0, 0, 0, 0, 0.000322310440723728, 0, 0, 0, NA, 0, 0, 0, 0, 0, 
0.00298460759118657, 0, 0, 0, 0, NA), .Dim = c(10L, 10L), .Dimnames = list(
c("41699", "41700", "41701", "41702", "41703", "41704", "41705", 
"41707", "41708", "41709"), c("41699", "41700", "41701", 
"41702", "41703", "41704", "41705", "41707", "41708", "41709"
)))

Then I converted corr matrix to Boolean matrix (0,1) which number one means good correlation. Then I want to math good correlations with significant pvalues. I need an edge list including the p-value. I implemented following code:

n=1
m=list()
for(i in 1:nrow(Rvalue))
  {
  for (j in 1:nrow(Rvalue))
    {
if (i<j & Pvalue[i,j]<0.05 & Rvalue[i,j]==1)
      {
      m[[n]]<-c(rownames(Rvalue)[i], colnames(Rvalue)[j], signif(Pvalue[i,j], digits = 4))
        n=n+1  
             }
      }
      print(i)
  }

then, then output is:

> m
[[1]]
[1] "41699" "41700" "0"    

[[2]]
[2] "41699" "41701" "0"    

[[3]]
[3] "41699" "41702" "0"    

[[4]]
[4] "41699" "41704" "0" 
...

Result is OK, but since the matrices are very big, it needs much time. How can I speed up this process? Please note that I need node names. Is there any related functions? I also have found two similar questions but not exactly what I needed (+ and +). Thanks in advance.

Upvotes: 2

Views: 315

Answers (2)

PNS
PNS

Reputation: 82

Since your matrix has a large number of columns and rows, that would be a good idea to avoid simultaneous "for loop". You can instead use mapply function which is more handy.

mapply(FUN = NULL , ...)

instead of FUN use the following function:

myf= function(x){ x "les then threshold"}

You can use mapply(FUN = myf , "Your Matrix") twice to check if the elements of two correlation and pvalue matrices agree with threshold. Store the results in two boolean matrices, P1 and P2. Then multiply P1 and P2 (direct multiplication).

myf1 = function(x) {x<0.05} myf2 = function(x) {x>0.7}

P1 = mapply(FUN = myf1 , matP)

P2 = mapply(FUN = myf2 , matR)

P = P1 * P2

The elements in P which are labeled as "True" are the desired nodes. It will work fine!

And here there is the result for your smaple:

P1 = mapply(FUN = myf1 , Pvalue)
P2 = mapply(FUN = myf2 , Rvalue)
P = P1 * P2

NA 1 1 1 0 1 1 0 1 1 1 NA 0 0 0 0 0 0 1 1 1 0 NA 1 0 1 1 1 1 1 1 0 1 NA 0 1 1 0 1 1 0 0 0 0 NA 1 0 1 0 0 1 0 1 1 1 NA 1 1 1 1 1 0 1 1 0 1 NA 1 1 1 0 0 1 0 1 1 1 NA 0 0 1 1 1 1 0 1 1 0 NA 1 1 1 1 1 0 1 1 0 1 NA

Upvotes: 1

akrun
akrun

Reputation: 887108

You could try

indx <- which(Rvalue==1 & Pvalue < 0.05 & !is.na(Pvalue), arr.ind=TRUE)
d1 <- data.frame(rN=row.names(Rvalue)[indx[,1]], 
               cN=colnames(Rvalue)[indx[,2]], Pval=signif(Pvalue[indx],
                                                                digits=4))

head(d1,2)
#     rN    cN Pval
#1 41700 41699    0
#2 41701 41699    0

Update

Not sure why you are getting the same result when you change the cutoff. It may be possible that the P values may be too small that it would be TRUE in the cutoffs you tried. Here is an example to show that it does return different values. Suppose, I create a function from the above code,

 f1 <- function(Rmat, Pmat, cutoff){
   indx <- which(Rmat==1 & Pmat < cutoff & !is.na(Pmat), arr.ind=TRUE)
    d1 <- data.frame(rN=row.names(Rmat)[indx[,1]], 
              cN=colnames(Rmat)[indx[,2]], Pval=signif(Pmat[indx],
                                                            digits=4))
 d1}

 f1(R1, P1, 0.05)
 #  rN cN  Pval
 #1  B  A 0.021
 #2  C  A 0.018
 #3  D  A 0.001
 #4  A  B 0.021
 #5  A  C 0.018
 #6  E  C 0.034
 #7  A  D 0.001
 #8  C  E 0.034

 f1(R1, P1, 0.01)
 #  rN cN  Pval
 #1  D  A 0.001
 #2  A  D 0.001

 f1(R1, P1, 0.001)
 #[1] rN   cN   Pval
 #<0 rows> (or 0-length row.names)

data

set.seed(24)
R1 <- matrix(sample(c(0,1), 5*5, replace=TRUE), 5,5, 
            dimnames=list(LETTERS[1:5], LETTERS[1:5]))
R1[lower.tri(R1)] <- 0
R1 <- R1+t(R1)
diag(R1) <- 1


set.seed(49)
P1 <- matrix(sample(seq(0,0.07, by=0.001), 5*5, replace=TRUE), 5, 5,
       dimnames=list(LETTERS[1:5], LETTERS[1:5]))

P1[lower.tri(P1)] <- 0
P1 <- P1+t(P1)
diag(P1) <- NA

Upvotes: 2

Related Questions