Reputation:
I´m building a correlation between two different matrices with rcorr()
function in R:
res <- rcorr(as.matrix(table1), as.matrix(table2),type="pearson")
It seems to be working fine, however I want to avoid within table correlations - any suggestion?
Upvotes: 4
Views: 15176
Reputation: 107587
Consider using R's base cor()
for distinct correlations between two sets as Hmisc's rcorr()
returns all possible combinations. Notice below the upper right quadrant of rcorr()
(which repeats diagonally symmetrical on lower left) is the entire result of cor()
(rounded to two decimal points).
table1 <- matrix(rnorm(25),5)
table2 <- matrix(rnorm(25),5)
res <- rcorr(table1, table2, type="pearson")
res
[,1] [,2] [,3] [,4] [,5] | [,6] [,7] [,8] [,9] [,10]
# [1,] 1.00 -0.55 0.95 -0.16 0.17 |-0.46 0.15 0.10 0.69 0.16
# [2,] -0.55 1.00 -0.55 -0.60 -0.79 |-0.45 -0.66 -0.22 -0.30 0.12
# [3,] 0.95 -0.55 1.00 -0.09 0.30 |-0.35 -0.05 -0.17 0.57 -0.03
# [4,] -0.16 -0.60 -0.09 1.00 0.91 | 0.92 0.53 -0.21 -0.58 -0.71
# [5,] 0.17 -0.79 0.30 0.91 1.00 | 0.78 0.41 -0.31 -0.32 -0.68
# ------------------------------------------------------------------
# [6,] -0.46 -0.45 -0.35 0.92 0.78 | 1.00 0.44 -0.14 -0.62 -0.58
# [7,] 0.15 -0.66 -0.05 0.53 0.41 | 0.44 1.00 0.68 0.13 0.13
# [8,] 0.10 -0.22 -0.17 -0.21 -0.31 |-0.14 0.68 1.00 0.59 0.80
# [9,] 0.69 -0.30 0.57 -0.58 -0.32 |-0.62 0.13 0.59 1.00 0.80
#[10,] 0.16 0.12 -0.03 -0.71 -0.68 |-0.58 0.13 0.80 0.80 1.00
# pvalues to follow ...
res <- cor(table1, table2, method="pearson")
res
# [,1] [,2] [,3] [,4] [,5]
# [1,] -0.4551474 0.15080994 0.1008215 0.6894955 0.16390813
# [2,] -0.4468285 -0.66209106 -0.2154960 -0.2954581 0.11662382
# [3,] -0.3542023 -0.05474287 -0.1720881 0.5669501 -0.02880113
# [4,] 0.9246330 0.53456574 -0.2084105 -0.5807386 -0.71108552
# [5,] 0.7788395 0.40551828 -0.3122606 -0.3209273 -0.67912147
The only caveat is significance test statistics including t-stats and p-values are not available with cor()
. However, they can be retrieved with cor.test()
which you can iteratively run with mapply()
. Below demonstrates with one test pairing and generalized for all other columns. Notice the test's estimate corresponds to values in cor()
output.
# EXAMPLE OF FIRST COL PAIRING
res <- cor.test(table1[,1], table2[,1], method="pearson")
res
# Pearson's product-moment correlation
# data: table1[, 1] and table2[, 1]
# t = -0.88536, df = 3, p-value = 0.4412
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
# -0.9542314 0.7137222
# sample estimates:
# cor
# -0.4551474
# OBTAIN ALL MATRIX COL COMBINATIONS
tblcols <- expand.grid(1:ncol(table1), 1:ncol(table2))
# MAPPLY COR.TEST ACROSS ALL COLS
cfunc <- function(var1, var2) {
cor.test(table1[,var1], table2[,var2], method="pearson")
}
res <- mapply(function(a,b) {
cfunc(var1 = a, var2 = b)
}, tblcols$Var1, tblcols$Var2)
head(res)
# [,1] [,2] [,3] [,4]
# statistic -0.8853596 -0.8650936 -0.6560274 4.204994
# parameter 3 3 3 3
# p.value 0.4411699 0.4506234 0.5586316 0.02455469
# estimate -0.4551474 -0.4468285 -0.3542023 0.924633
# null.value 0 0 0 0
# alternative "two.sided" "two.sided" "two.sided" "two.sided"
# [,5] [,6] [,7] [,8]
# statistic 2.150733 0.2642326 -1.53021 -0.09495982
# parameter 3 3 3 3
# p.value 0.1206246 0.8087132 0.2234562 0.930334
# estimate 0.7788395 0.1508099 -0.6620911 -0.05474287
# null.value 0 0 0 0
# alternative "two.sided" "two.sided" "two.sided" "two.sided"
# ...
Upvotes: 7