Reputation: 787
I have the following dataframe:
set.seed(1)
y <- data.frame(a1 = rnorm(5) , b1 = rnorm(5), c1 = rnorm(5), a2 = rnorm(5), b2 = rnorm(5), c2 = rnorm(5))
I would like to obtain the correlations of the pairs of columns: cor(a1,a2), cor(b1,b2), cor(c1,c2)
I tried the following but NA's appear as output:
apply(y,2,function(x) cor(x[1],x[3]))
I would like to get the result equivalent to
cor(y[,1],y[,4])
cor(y[,2],y[,5])
cor(y[,3],y[,6])
In my actual data frame, I have many more pairs of columns.
Any ideas?
Thanks for your support.
Upvotes: 0
Views: 570
Reputation: 121608
Another approach using variable regular expression on names. This works also if variable names are in arbitrary order.
nn <-
unique(sub('([0-9]+)','',names(y )))
sapply(nn,function(x){
xy = y[,grep(x,names(y))]
cor(xy[,1],xy[,2])})
a b c
-0.7615458 0.5683647 0.5594564
Upvotes: 0
Reputation: 89097
num.vars <- length(y)
var1 <- head(names(y), num.vars / 2)
var2 <- tail(names(y), num.vars / 2)
mapply(cor, y[var1], y[var2])
# a1 b1 c1
# 0.2491625 -0.5313192 0.5594564
Upvotes: 4