Reputation: 1
I have two numeric data sets. Df1 and Df2 contain 15000 columns each. I now want to calculate correlations but only between column 1 of Df1 and column 1 of Df2, then between column 2 Df1 and column 2 Df2 and so on. This for all 15000 columns. Creating a correlation matrix generates a lot of unwanted correlations. Therefore I am looking for a more elegant solution.
Can anyone help me here?
Thanks in advance H.
Upvotes: 0
Views: 645
Reputation: 16978
Another solution based on purrr
(using dcarlson's data):
library(purrr)
map2_dbl(
.x = Df1,
.y = Df2,
~ cor(.x, .y)
)
This returns
#> X1 X2 X3 X4 X5
#> 0.24864047 -0.40809796 0.03718413 -0.09967868 0.46627380
Upvotes: 0
Reputation: 11056
You should provide reproducible data by extracting a few rows/cols of your data to illustrate what you have tried. Or just make up data with a similar structure, e.g.:
set.seed(42)
Df1 <- data.frame(matrix(runif(50), 10, 5))
Df2 <- data.frame(matrix(runif(50), 10, 5))
Now use sapply
:
idx <- ncol(Df1)
result <- sapply(seq(idx), function(i) cor(Df1[, i], Df2[, i]))
result
# [1] 0.24864047 -0.40809796 0.03718413 -0.09967868 0.46627380
Upvotes: 1