Caragh
Caragh

Reputation: 69

Find correlation between columns whose names are specified as values in another dataframe

I have two dataframes, one is a list of pairs of individuals, similar to below (but with about 150 pairs):

ID_1   ID_2
X14567  X26789
X12637 X34560
X67495 X59023

The other dataframe consists of once column per individual with numerical values relating to that individuals underneath. All told about 300 columns and 300 rows. For example:

X14567  X12637  X26789  X67495  X34560  X59023
0.41    0.29    0.70    0.83    0.41    0.30
0.59    0.44    0.20    0.94    0.03    0.97
0.48    0.91    0.78    0.92    0.40    0.09
0.07    0.21    0.42    0.14    0.96    0.96
0.33    0.13    0.53    0.04    0.52    0.49
0.94    0.28    0.37    0.26    0.11    0.09

I want to find the correlation of these values between each pair of individuals. to end up with something like:

ID_1       ID_2    Correlation
X14567     X26789      -0.25
X12637     X34560      -0.25
X67495     X59023      -0.11

Is there a way that I can pull the values from the first dataframe to specify the name of the two columns that I need to find correlations between in such a way that can be easily repeated for each row of the first dataframe?

Many thanks for your help

Upvotes: 4

Views: 1649

Answers (2)

eipi10
eipi10

Reputation: 93761

If you just want the correlations between all columns in your second data frame, you can do:

library(reshape2)

df.corr = melt(cor(df))

To remove repeated columns (that is, the correlation of each column with itself):

df.corr = subset(df.corr, Var1 != Var2)

Example using built-in mtcars data frame:

mtcars.corr = melt(cor(mtcars))
    Var1 Var2       value
1    mpg  mpg  1.00000000
2    cyl  mpg -0.85216196
3   disp  mpg -0.84755138
...
119   am carb  0.05753435
120 gear carb  0.27407284
121 carb carb  1.00000000

Upvotes: 1

tcash21
tcash21

Reputation: 4995

If x and y are your two data.frames and the column names are set appropriately, you can use apply.

apply(x, 1, function(row) cor(y[row[1]], y[row[2]]))

From there just add the values to your x data.frame:

x$cor <- apply(x, 1, function(row) cor(y[row[1]], y[row[2]]))


      V1     V2        cor
2 X14567 X26789 -0.2515737
3 X12637 X34560 -0.2563294
4 X67495 X59023 -0.1092830

Upvotes: 2

Related Questions