user2388815
user2388815

Reputation: 11

dataframe (product) correlations in R

I've got 2 dataframes each with 150 rows and 10 columns + column and row IDs. I want to correlate every row in one dataframe with every row in the other (e.g. 150x150 correlations) and plot the distribution of the resulting 22500 values.(Then I want to calculate p values etc from the distribution - but that's the next step).

Frankly I don't know where to start with this. I can read my data in and see how to correlate vectors or matching slices of two matrices etc., but I can't get handle on what I'm trying to do here.

Upvotes: 1

Views: 268

Answers (2)

Vincent Zoonekynd
Vincent Zoonekynd

Reputation: 32401

You can use cor with two arguments:

cor( t(m1), t(m2) )

Upvotes: 1

Roland
Roland

Reputation: 132999

set.seed(42)
DF1 <- as.data.frame(matrix(rnorm(1500),150))
DF2 <- as.data.frame(matrix(runif(1500),150))

#transform to matrices for better performance
m1 <- as.matrix(DF1)
m2 <- as.matrix(DF2)

#use outer to get all combinations of row numbers and apply a function to them
#22500 combinations is small enough to fit into RAM
cors <- outer(seq_len(nrow(DF1)),seq_len(nrow(DF2)),
     #you need a vectorized function
     #Vectorize takes care of that, but is just a hidden loop (slow for huge row numbers)
     FUN=Vectorize(function(i,j) cor(m1[i,],m2[j,])))
hist(cors)

enter image description here

Upvotes: 2

Related Questions