Reputation: 79
I have a dataset with 200 columns and 1000 rows of observations for each column. I am trying to find the correlation between each column, with no repeats. So, for example, column 1 & 2, column 1 & 3, column 2 & 3, but NOT column 3 & 1 because that is the same as the first pairing. Mathematically, I should have 19900 pairs of columns, but I can't figure out how to get that. The code I have so far is below:
corr.results<- rep(NA,19900)
for(i in 1:19900)
{
column1<- i
column2<- i+1
results<- cor.test(all.null.data[ ,column1], all.null.data[ ,column2],
alternative = "two.sided", method="pearson",
exact=NULL, conf.level=0.95, continuity=FALSE)
corr.results[i]<- results$p.value
}
View(corr.results)
Obviously, this is incorrect because I am only doing adjacent pairs (e.g. 1&2, 2&3, 3&4, etc.), but it's all I've got so far.
Upvotes: 1
Views: 127
Reputation: 388982
Use combn
to create all possible combinations.
combn(seq_along(all.null.data), 2, function(x) {
cor.test(all.null.data[ ,x[1]], all.null.data[ ,x[2]],
alternative = "two.sided", method="pearson",
exact=NULL, conf.level=0.95, continuity=FALSE)$p.value
}) -> corr.results
corr.results
For 200 columns it will return 19900 values to you.
ncol(combn(1:200, 2))
#[1] 19900
Upvotes: 1
Reputation: 10375
Using your example with a (double) loop, and mtcars toy dataset
res=list()
for (i in 1:(ncol(mtcars)-1)) {
for (j in (i+1):ncol(mtcars)) {
res=c(
res,
list(c(i,j,cor.test(mtcars[,i],mtcars[,j])$p.value))
)
}
}
res=do.call(rbind,res)
colnames(res)=c("i","j","p")
Upvotes: 1