biobudhan
biobudhan

Reputation: 309

compute correlation in R between two columns from different data frame

I would like to compute the correlation of two columns from two different data frames.

For example:

dataframe 1:

identifier  description Score
qzqzgz  desc1   0.12
zzqzgq  desc2   8.98
zzqzgg  desc3   0.55
zzqzgc  desc4   3.66
zzqzgz  desc5   1.22
qqzgzz  desc6   -30.23
zqzgzq  desc6   7.88
zqzgzg  desc6   6.45
zqzgzc  desc6   2.33
zqzgzz  desc6   1.02

dataframe2:

    identifier  description S1  S2  S3  S4  S5     S6
    qzqzgz  desc1   9   3   4   6   7   4
    zzqzgq  desc2   5   3   6   2   3   6
    zzqzgg  desc3   9   9   12  12  14  13
    zzqzgc  desc4   6   4   8   6   6   6
    zzqzgz  desc5   10  5   5   5   5   11
    qqzgzz  desc6   11  12  17  12  11  17
    zqzgzq  desc6   8   2   1   4   4   3
    zqzgzg  desc6   2   4   9   9   5   10
    zqzgzc  desc6   7   5   8   5   7   3
    zqzgzz  desc6   11  5   7   9   9   12

I would like to compute the correlation between : 3rd column of dataframe1 (Score) and 3rd columns of dataframe2 (S1). 3rd column of dataframe1 (Score) and 3rd columns of dataframe2 (S2). 3rd column of dataframe1 (Score) and 3rd columns of dataframe2 (S3). 3rd column of dataframe1 (Score) and 3rd columns of dataframe2 (S4). and so on.

This is what I have written so far:

    for (i in 3:8)
       {
         cortop[i] <- cor(dataframe1$Score_top,dataframe2$i)
          }

I am a newbie to R. Please help in writing a loop for this.

Upvotes: 6

Views: 41550

Answers (1)

Sven Hohenstein
Sven Hohenstein

Reputation: 81693

You don't need a loop here:

cor(dataframe1$Score, dataframe2[-c(1:2)])

#             S1         S2         S3        S4       S5         S6
# [1,] -0.555369 -0.8556331 -0.7682521 -0.629983 -0.57097 -0.6790326

By the way: Your code didn't work because you cannot use $ with a variable. Hence, you have to replace dataframe2$i with dataframe2[[i]] to access the i-th column.


Update:

Since the values in dataframe2 are factors, you have to convert them to numeric values before using cor:

cor(dataframe1$Score, "storage.mode<-"(as.matrix(dataframe2[-c(1:2)]), "numeric"))

Upvotes: 9

Related Questions