Reputation: 85
I have data frame (df) with 2 columns e.g.
Variable(character): Value(numeric):
A 12.25
A 2.14
A 31.10
B 4.6
B 6.987
D 74.10
D 6.17
D 10.365
D 54.98
C 10.47
C 156.1420
C 1.69
I would like to calculate the correlation between each variable. Something like that (values are completely random:
A B D C
A 0.25 0.32 0.1256 0.9
B 0.9 0.47 0.125 0.144
D 0.36 0.12 0.87 0.54
C 0.369 0.147 0.4 0.485
Upvotes: 0
Views: 727
Reputation: 52637
Assuming your variables have all the same number of observations:
cor(as.data.frame(split(df$val, df$var)))
Produces:
a b c d
a 1.0000000 0.3332724 -0.4755813 -0.1367066
b 0.3332724 1.0000000 -0.9171748 -0.2348487
c -0.4755813 -0.9171748 1.0000000 0.5713294
d -0.1367066 -0.2348487 0.5713294 1.0000000
The assumption doesn't hold true in your data. Not sure how you intend to calculate correlations with unequal number of observations. Here is the data I used:
set.seed(1)
df <- data.frame(var=rep(letters[1:4], each=4), val=runif(16))
Upvotes: 3