Reputation: 103
I have these data
> a
a b c
1 1 -1 4
2 2 -2 6
3 3 -3 9
4 4 -4 12
5 5 -5 6
> b
d e f
1 6 -5 7
2 7 -4 4
3 8 -3 3
4 9 -2 3
5 10 -1 9
> cor(a,b)
d e f
a 1.0000000 1.0000000 0.1767767
b -1.0000000 -1.000000 -0.1767767
c 0.5050763 0.5050763 -0.6964286
The result I want is just:
cor(a,d) = 1
cor(b,e) = -1
cor(c,f) = -0.6964286
Upvotes: 8
Views: 11967
Reputation: 17011
For larger matrices Rfast::corpairs
is a bit faster than mapply
.
library(Rfast)
a <- matrix(rnorm(1e6), 1e3)
b <- matrix(rnorm(1e6), 1e3)
microbenchmark::microbenchmark(
diag = diag(cor(a, b)),
mapply = as.numeric(mapply(cor,as.data.frame(a),as.data.frame(b))),
corpairs = corpairs(a, b),
check = "equal"
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> diag 1294.1268 1303.72885 1313.41297 1306.84705 1314.44330 1426.1978 100
#> mapply 49.4884 54.91380 58.52463 57.11995 60.49635 115.6425 100
#> corpairs 8.6714 12.57265 15.29742 14.81435 16.18880 65.3170 100
Upvotes: 1
Reputation: 651
mapply
works with data frames but not matrices. That is because in data frames each column is an element, while in matrices each entry is an element.
In the answer above mapply(cor,as.data.frame(a),as.data.frame(b))
works just fine.
set.seed(1)
a=matrix(rnorm(15),nrow=5)
b=matrix(rnorm(15),nrow=5)
diag(cor(a,b))
[1] 0.2491625 -0.5313192 0.5594564
mapply(cor,as.data.frame(a),as.data.frame(b))
V1 V2 V3
0.2491625 -0.5313192 0.5594564
This is much more efficient for large matrices.
Upvotes: 2
Reputation: 121
The first answer above calculates all pairwise correlations, which is fine unless the matrices are large, and the second one doesn't work. As far as I can tell, efficient computation must be done directly, such as this code borrowed from borrowed from the arrayMagic Bioconductor package, works efficiently for large matrices:
> colCors = function(x, y) {
+ sqr = function(x) x*x
+ if(!is.matrix(x)||!is.matrix(y)||any(dim(x)!=dim(y)))
+ stop("Please supply two matrices of equal size.")
+ x = sweep(x, 2, colMeans(x))
+ y = sweep(y, 2, colMeans(y))
+ cor = colSums(x*y) / sqrt(colSums(sqr(x))*colSums(sqr(y)))
+ return(cor)
+ }
> set.seed(1)
> a=matrix(rnorm(15),nrow=5)
> b=matrix(rnorm(15),nrow=5)
> diag(cor(a,b))
[1] 0.2491625 -0.5313192 0.5594564
> mapply(cor,a,b)
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> colCors(a,b)
[1] 0.2491625 -0.5313192 0.5594564
Upvotes: 12
Reputation: 176708
I would probably personally just use diag
:
> diag(cor(a,b))
[1] 1.0000000 -1.0000000 -0.6964286
But you could also use mapply
:
> mapply(cor,a,b)
a b c
1.0000000 -1.0000000 -0.6964286
Upvotes: 4