Reputation: 112
I have used rcorr function of Hmisc library for calculation of correlations and p-values. Then extracted pvalues to Pval matrix and correlation coefficients to corr matrix.
Rvalue<-structure(c(1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0,
0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0,
1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0,
1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1), .Dim = c(10L,
10L), .Dimnames = list(c("41699", "41700", "41701", "41702",
"41703", "41704", "41705", "41707", "41708", "41709"), c("41699",
"41700", "41701", "41702", "41703", "41704", "41705", "41707",
"41708", "41709")))
> Pvalue<-structure(c(NA, 0, 0, 0, 0.0258814351024321, 0, 0, 0, 0, 0, 0,
NA, 6.70574706873595e-14, 0, 0, 2.1673942640632e-09, 1.08217552696743e-07,
0.0105345133269157, 0, 0, 0, 6.70574706873595e-14, NA, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, NA, 2.22044604925031e-15, 0, 0, 0, 0,
0, 0.0258814351024321, 0, 0, 2.22044604925031e-15, NA, 0, 0,
0, 0.000322310440723728, 0.00298460759118657, 0, 2.1673942640632e-09,
0, 0, 0, NA, 0, 0, 0, 0, 0, 1.08217552696743e-07, 0, 0, 0, 0,
NA, 0, 0, 0, 0, 0.0105345133269157, 0, 0, 0, 0, 0, NA, 0, 0,
0, 0, 0, 0, 0.000322310440723728, 0, 0, 0, NA, 0, 0, 0, 0, 0,
0.00298460759118657, 0, 0, 0, 0, NA), .Dim = c(10L, 10L), .Dimnames = list(
c("41699", "41700", "41701", "41702", "41703", "41704", "41705",
"41707", "41708", "41709"), c("41699", "41700", "41701",
"41702", "41703", "41704", "41705", "41707", "41708", "41709"
)))
Then I converted corr matrix to Boolean matrix (0,1) which number one means good correlation. Then I want to math good correlations with significant pvalues. I need an edge list including the p-value. I implemented following code:
n=1
m=list()
for(i in 1:nrow(Rvalue))
{
for (j in 1:nrow(Rvalue))
{
if (i<j & Pvalue[i,j]<0.05 & Rvalue[i,j]==1)
{
m[[n]]<-c(rownames(Rvalue)[i], colnames(Rvalue)[j], signif(Pvalue[i,j], digits = 4))
n=n+1
}
}
print(i)
}
then, then output is:
> m
[[1]]
[1] "41699" "41700" "0"
[[2]]
[2] "41699" "41701" "0"
[[3]]
[3] "41699" "41702" "0"
[[4]]
[4] "41699" "41704" "0"
...
Result is OK, but since the matrices are very big, it needs much time. How can I speed up this process? Please note that I need node names. Is there any related functions? I also have found two similar questions but not exactly what I needed (+ and +). Thanks in advance.
Upvotes: 2
Views: 315
Reputation: 82
Since your matrix has a large number of columns and rows, that would be a good idea to avoid simultaneous "for loop". You can instead use mapply function which is more handy.
mapply(FUN = NULL , ...)
instead of FUN use the following function:
myf= function(x){ x "les then threshold"}
You can use mapply(FUN = myf , "Your Matrix")
twice to check if the elements of two correlation and pvalue matrices agree with threshold.
Store the results in two boolean matrices, P1 and P2. Then multiply P1 and P2 (direct multiplication).
myf1 = function(x) {x<0.05} myf2 = function(x) {x>0.7}
P1 = mapply(FUN = myf1 , matP)
P2 = mapply(FUN = myf2 , matR)
P = P1 * P2
The elements in P which are labeled as "True" are the desired nodes. It will work fine!
And here there is the result for your smaple:
P1 = mapply(FUN = myf1 , Pvalue)
P2 = mapply(FUN = myf2 , Rvalue)
P = P1 * P2
NA 1 1 1 0 1 1 0 1 1 1 NA 0 0 0 0 0 0 1 1 1 0 NA 1 0 1 1 1 1 1 1 0 1 NA 0 1 1 0 1 1 0 0 0 0 NA 1 0 1 0 0 1 0 1 1 1 NA 1 1 1 1 1 0 1 1 0 1 NA 1 1 1 0 0 1 0 1 1 1 NA 0 0 1 1 1 1 0 1 1 0 NA 1 1 1 1 1 0 1 1 0 1 NA
Upvotes: 1
Reputation: 887108
You could try
indx <- which(Rvalue==1 & Pvalue < 0.05 & !is.na(Pvalue), arr.ind=TRUE)
d1 <- data.frame(rN=row.names(Rvalue)[indx[,1]],
cN=colnames(Rvalue)[indx[,2]], Pval=signif(Pvalue[indx],
digits=4))
head(d1,2)
# rN cN Pval
#1 41700 41699 0
#2 41701 41699 0
Not sure why you are getting the same
result when you change the cutoff
. It may be possible that the P values
may be too small that it would be TRUE
in the cutoffs
you tried. Here is an example to show that it does return different values. Suppose, I create a function from the above code,
f1 <- function(Rmat, Pmat, cutoff){
indx <- which(Rmat==1 & Pmat < cutoff & !is.na(Pmat), arr.ind=TRUE)
d1 <- data.frame(rN=row.names(Rmat)[indx[,1]],
cN=colnames(Rmat)[indx[,2]], Pval=signif(Pmat[indx],
digits=4))
d1}
f1(R1, P1, 0.05)
# rN cN Pval
#1 B A 0.021
#2 C A 0.018
#3 D A 0.001
#4 A B 0.021
#5 A C 0.018
#6 E C 0.034
#7 A D 0.001
#8 C E 0.034
f1(R1, P1, 0.01)
# rN cN Pval
#1 D A 0.001
#2 A D 0.001
f1(R1, P1, 0.001)
#[1] rN cN Pval
#<0 rows> (or 0-length row.names)
set.seed(24)
R1 <- matrix(sample(c(0,1), 5*5, replace=TRUE), 5,5,
dimnames=list(LETTERS[1:5], LETTERS[1:5]))
R1[lower.tri(R1)] <- 0
R1 <- R1+t(R1)
diag(R1) <- 1
set.seed(49)
P1 <- matrix(sample(seq(0,0.07, by=0.001), 5*5, replace=TRUE), 5, 5,
dimnames=list(LETTERS[1:5], LETTERS[1:5]))
P1[lower.tri(P1)] <- 0
P1 <- P1+t(P1)
diag(P1) <- NA
Upvotes: 2