Reputation: 309
I would like to compute the correlation of two columns from two different data frames.
For example:
dataframe 1:
identifier description Score
qzqzgz desc1 0.12
zzqzgq desc2 8.98
zzqzgg desc3 0.55
zzqzgc desc4 3.66
zzqzgz desc5 1.22
qqzgzz desc6 -30.23
zqzgzq desc6 7.88
zqzgzg desc6 6.45
zqzgzc desc6 2.33
zqzgzz desc6 1.02
dataframe2:
identifier description S1 S2 S3 S4 S5 S6
qzqzgz desc1 9 3 4 6 7 4
zzqzgq desc2 5 3 6 2 3 6
zzqzgg desc3 9 9 12 12 14 13
zzqzgc desc4 6 4 8 6 6 6
zzqzgz desc5 10 5 5 5 5 11
qqzgzz desc6 11 12 17 12 11 17
zqzgzq desc6 8 2 1 4 4 3
zqzgzg desc6 2 4 9 9 5 10
zqzgzc desc6 7 5 8 5 7 3
zqzgzz desc6 11 5 7 9 9 12
I would like to compute the correlation between : 3rd column of dataframe1 (Score) and 3rd columns of dataframe2 (S1). 3rd column of dataframe1 (Score) and 3rd columns of dataframe2 (S2). 3rd column of dataframe1 (Score) and 3rd columns of dataframe2 (S3). 3rd column of dataframe1 (Score) and 3rd columns of dataframe2 (S4). and so on.
This is what I have written so far:
for (i in 3:8)
{
cortop[i] <- cor(dataframe1$Score_top,dataframe2$i)
}
I am a newbie to R. Please help in writing a loop for this.
Upvotes: 6
Views: 41550
Reputation: 81693
You don't need a loop here:
cor(dataframe1$Score, dataframe2[-c(1:2)])
# S1 S2 S3 S4 S5 S6
# [1,] -0.555369 -0.8556331 -0.7682521 -0.629983 -0.57097 -0.6790326
By the way: Your code didn't work because you cannot use $
with a variable. Hence, you have to replace dataframe2$i
with dataframe2[[i]]
to access the i-th column.
Update:
Since the values in dataframe2
are factors, you have to convert them to numeric values before using cor
:
cor(dataframe1$Score, "storage.mode<-"(as.matrix(dataframe2[-c(1:2)]), "numeric"))
Upvotes: 9