Reputation: 569
I have a set of large dataframes that look like A and B:
A <- data.frame(A1=c(1,2,3,4,5),B1=c(6,7,8,9,10),C1=c(11,12,13,14,15 ))
A1 B1 C1
1 1 6 11
2 2 7 12
3 3 8 13
4 4 9 14
5 5 10 15
B <- data.frame(A2=c(6,7,7,10,11),B2=c(2,1,3,8,11),C2=c(1,5,16,7,8))
A2 B2 C2
1 6 2 1
2 7 1 5
3 7 3 16
4 10 8 7
5 11 11 8
I would like to create a vector (C) that denotes the Pearson correlation between A1 & A2, B1 & B2, and C1 & C2. In this case, for example, those correlations are:
[1] 0.95 0.92 0.46
Upvotes: 11
Views: 3310
Reputation: 132999
cor
accepts two data.frames:
A<-data.frame(A1=c(1,2,3,4,5),B1=c(6,7,8,9,10),C1=c(11,12,13,14,15 ))
B<-data.frame(A2=c(6,7,7,10,11),B2=c(2,1,3,8,11),C2=c(1,5,16,7,8))
cor(A,B)
# A2 B2 C2
# A1 0.9481224 0.9190183 0.459588
# B1 0.9481224 0.9190183 0.459588
# C1 0.9481224 0.9190183 0.459588
diag(cor(A,B))
#[1] 0.9481224 0.9190183 0.4595880
Edit:
Here are some benchmarks:
Unit: microseconds
expr min lq median uq max neval
diag(cor(A, B)) 230.292 238.4225 243.0115 255.0295 352.955 100
mapply(cor, A, B) 267.076 281.5120 286.8030 299.5260 375.087 100
unlist(Map(cor, A, B)) 250.053 259.1045 264.5635 275.9035 1146.140 100
Edit2:
And some better benchmarks using
set.seed(42)
A <- as.data.frame(matrix(rnorm(10*n),ncol=n))
B <- as.data.frame(matrix(rnorm(10*n),ncol=n))
However, I should probably mention that these benchmarks strongly depend on the number of rows.
Edit3: Since I was asked for the benchmarking code, here it is.
b <- sapply(2^(1:12), function(n) {
set.seed(42)
A <- as.data.frame(matrix(rnorm(10*n),ncol=n))
B <- as.data.frame(matrix(rnorm(10*n),ncol=n))
require(microbenchmark)
res <- print(microbenchmark(
diag(cor(A,B)),
mapply(cor, A, B),
unlist(Map(cor,A,B)),
times=10
),unit="us")
res$median
})
b <- t(b)
matplot(x=1:12,log10(b),type="l",
ylab="log10(median [µs])",
xlab="log2(n)",col=1:3,lty=1)
legend("topleft", legend=c("diag(cor(A, B))",
"mapply(cor, A, B)",
"unlist(Map(cor,A,B))"),lty=1, col=1:3)
Upvotes: 14
Reputation: 15458
You can use friend of apply functions, Map
, for that.
Map(function(x,y) cor(x,y),A,B)
$A1
[1] 0.9481224
$B1
[1] 0.9190183
$C1
[1] 0.459588
If you want the output as vector as suggested by @Jilber :
unlist(Map(function(x,y) cor(x,y),A,B))
A1 B1 C1
0.9481224 0.9190183 0.4595880
Or you can just use:
unlist(Map(cor,A,B))
A1 B1 C1
0.9481224 0.9190183 0.459588
Upvotes: 7
Reputation: 61214
Another alternative you can use mapply
function
> mapply(function(x,y) cor(x,y),A,B)
A1 B1 C1
0.9481224 0.9190183 0.4595880
Or just mapply(cor, A, B)
as suggested by @Aaron.
Upvotes: 6