Reputation: 59
I am trying to calculate a correlation between two columns in a dataframe, based on a matching pattern in column name. In this case, I want to get the correlation between columns 'Obs1_grp1' and 'Obs1_grp2', 'Obs2_grp1' and 'Obs2_grp2', and so on.
> test
Individual Obs1_grp1 Obs2_grp1 Obs3_grp1 Obs1_grp2 Obs2_grp2 Obs3_grp2
1 293748 1583 0.09858 0.7639 15.70 0.5877 3230
2 294934 1353 0.11090 0.7639 15.58 0.5697 3604
3 290652 1400 0.11030 0.7639 14.93 0.5779 3462
4 291320 1377 0.11480 0.7639 15.63 0.5706 3130
5 292553 1253 0.10640 0.7639 15.41 0.5750 3601
6 295073 1306 0.09881 0.7639 15.54 0.5829 3630
7 290966 1523 0.10960 0.7639 15.48 0.5724 3274
8 293501 1472 0.10090 0.7639 15.31 0.5764 3507
9 295464 1603 0.10000 0.7639 15.66 0.5816 3512
10 291367 1476 0.09944 0.7639 15.38 0.5805 3116
11 295007 1517 0.11290 0.7639 14.98 0.5773 3130
12 295006 1340 0.11290 0.7639 14.98 0.5773 3130
13 290570 1314 0.10000 0.7639 15.45 0.5806 3759
14 293940 1515 0.10350 0.7639 15.55 0.5824 3564
15 293925 1259 0.10890 0.7639 15.15 0.5757 3593
16 291353 1494 0.09944 0.7639 15.38 0.5805 3116
17 292783 1637 0.10180 0.7639 15.38 0.5754 3396
18 290833 1202 0.11030 0.7639 13.66 0.5743 3106
19 291826 1298 0.10570 0.7639 16.26 0.5786 3946
20 294410 1539 0.10070 0.7639 15.39 0.5829 3550
I was thinking to use a grep function, but not sure how?
Also, I would like to put the calculated correlation in a datatable and specify for each correlation the Obs. Something like this:
> correl
Obs cor
1 Obs1 0.853
2 Obs2 0.952
Hope anyone can help here.
Upvotes: 1
Views: 428
Reputation: 3269
You can create a function like this:
cor_f <- function(x) {
cor(test[,names(test)[grepl(x, names(test))]])[2]
}
cor_f('Obs1') #correlation between Obs1_grp1 and Obs1_grp2
#0.3159908
In case you need a loop, one way would be:
vars <- c('Obs1', 'Obs2')
sapply(vars, function(i) cor_f(i))
Upvotes: 1