Reputation: 31
I am trying to write some code to easily calculate the correlation between all successive columns in a matrix.
Let's assume I have columns A, B, C, D, E.
I want the pairwise correlations AB, BC, CD, DE.
In order to avoid writing a loop, I have played around with sapply, but not very successfully, so far.
I'd be grateful for any support.
Upvotes: 2
Views: 886
Reputation: 12005
Loops aren't always bad - especially if you know how big your results vector should be in advance, then fill it in.
set.seed(1)
mat <- matrix(rnorm(50), nrow=10, ncol=5)
succ.cor <- function(x){
n <- ncol(x)
col1 <- seq(n)[-n]
col2 <- seq(n)[-1]
res <- seq(col1)
for(i in seq(res)){
res[i] <- cor(x[,col1[i]], x[,col2[i]])
}
res
}
succ.cor(mat)
#[1] -0.37670337 0.60402733 0.08296412 0.34192416
Here is a better comparison of speed between some of the methods presented here:
set.seed(1)
m=3000
n=1000
A <- as.data.frame(matrix(rnorm(m*n), m, n))
#lukeA
t1 <- Sys.time()
tmp1 <- sapply(1:(ncol(A)-1), function(x) cor(A[x], A[x+1]))
lukeA.diff <- Sys.time() - t1
lukeA.diff
#Rufo
t1 <- Sys.time()
tmp2 <- diag(cor(A[,1:(dim(A)[2]-1)], A[,2:(dim(A)[2])]))
Rufo.diff <- Sys.time() - t1
Rufo.diff
#Marc in the box
t1 <- Sys.time()
tmp3 <- succ.cor(A)
Marcinthebox.diff <- Sys.time() - t1
Marcinthebox.diff
#BrodieG
t1 <- Sys.time()
tmp4 <- cor(A)[cbind(2:ncol(A), 1:(ncol(A) - 1))]
BrodieG.diff <- Sys.time() - t1
BrodieG.diff
#Jilber (from http://stackoverflow.com/a/18535544/1199289)
t1 <- Sys.time()
tmp5 <- mapply(cor, A[,1:(dim(A)[2]-1)], A[,2:(dim(A)[2])])
Jilber.diff <- Sys.time() - t1
Jilber.diff
t(data.frame(Jilber.diff, Marcinthebox.diff, lukeA.diff, BrodieG.diff, Rufo.diff))
Jilber.diff "0.2349489 secs"
Marcinthebox.diff "0.2255359 secs"
lukeA.diff "0.408231 secs"
BrodieG.diff "6.042533 secs"
Rufo.diff "12.20104 secs"
So it seems like mapply
approach is also fast. lukeA's and mine as well..
Upvotes: 1
Reputation: 52697
You can take advantage of the fact that cor
automatically computes all the column wise correlations:
cor(df)[cbind(2:ncol(df), 1:(ncol(df) - 1))]
# [1] -0.08727070 -0.10444715 0.06008165 0.18030921
Compare to:
cor(df$a, df$b)
# [1] -0.0872707
cor(df$b, df$c)
# [1] -0.1044471
Here, we compute the full correlation matrix, and then subset to get the super-diagonal (the diagonal shifted one up from the actual diagonal), which corresponds to the correlations of cols 1 - 2, 2 - 3, etc. We subset using a matrix, created by cbind
that specifies all the super diagonal coordinates.
And here is how I generated the data:
set.seed(123)
df <- as.data.frame(replicate(5, runif(100), s=F))
names(df) <- letters[1:ncol(df)]
Upvotes: 2
Reputation: 54287
If you want sapply
:
set.seed(1)
df <- data.frame(a=runif(100), b=runif(100), c=runif(100), d=runif(100))
sapply(1:(ncol(df)-1), function(x) cor(df[x], df[x+1]))
# [1] 0.017032146 0.009675918 0.103959503
Upvotes: 1
Reputation: 534
Let's reinvent the weel, hehe.
aaa<-data.frame(a=runif(10),b=runif(10),c=runif(10),d=runif(10),e=runif(10))
diag(cor(aaa[,1:(dim(aaa)[2]-1)], aaa[,2:(dim(aaa)[2])]))
Upvotes: 1
Reputation: 2526
There is really no need to reinvent the wheel. Use the corrplot
package:
require(corrplot)
data(mtcars)
M <- cor(mtcars)
corrplot(M, order ="AOE", addCoef.col="gray40")
corrplot(M, order="AOE",method="ellips", col="grey", cl.pos="n",addCoef.col="yellow")
to install the package:
install.packages("corrplot")
Upvotes: 1