amikoma
amikoma

Reputation: 85

Correlation between many variables in one column

I have data frame (df) with 2 columns e.g.

Variable(character):      Value(numeric):     
A                       12.25             
A                       2.14              
A                       31.10              
B                       4.6      
B                       6.987
D                       74.10
D                       6.17
D                       10.365
D                       54.98
C                       10.47
C                       156.1420
C                       1.69 

I would like to calculate the correlation between each variable. Something like that (values are completely random:

      A        B          D        C            
A     0.25     0.32       0.1256   0.9               
B     0.9      0.47       0.125    0.144
D     0.36     0.12       0.87     0.54          
C     0.369    0.147      0.4      0.485        

Upvotes: 0

Views: 727

Answers (1)

BrodieG
BrodieG

Reputation: 52637

Assuming your variables have all the same number of observations:

cor(as.data.frame(split(df$val, df$var)))

Produces:

           a          b          c          d
a  1.0000000  0.3332724 -0.4755813 -0.1367066
b  0.3332724  1.0000000 -0.9171748 -0.2348487
c -0.4755813 -0.9171748  1.0000000  0.5713294
d -0.1367066 -0.2348487  0.5713294  1.0000000

The assumption doesn't hold true in your data. Not sure how you intend to calculate correlations with unequal number of observations. Here is the data I used:

set.seed(1)
df <- data.frame(var=rep(letters[1:4], each=4), val=runif(16))

Upvotes: 3

Related Questions