How to pair all columns once with no repeats in R?

Question

I have a dataset with 200 columns and 1000 rows of observations for each column. I am trying to find the correlation between each column, with no repeats. So, for example, column 1 & 2, column 1 & 3, column 2 & 3, but NOT column 3 & 1 because that is the same as the first pairing. Mathematically, I should have 19900 pairs of columns, but I can't figure out how to get that. The code I have so far is below:

corr.results<- rep(NA,19900)
for(i in 1:19900)
  {
  column1<- i
  column2<- i+1
  
  results<- cor.test(all.null.data[ ,column1], all.null.data[ ,column2], 
                          alternative = "two.sided", method="pearson", 
                          exact=NULL, conf.level=0.95, continuity=FALSE)
  corr.results[i]<- results$p.value
}

View(corr.results)

Obviously, this is incorrect because I am only doing adjacent pairs (e.g. 1&2, 2&3, 3&4, etc.), but it's all I've got so far.

Ronak Shah · Accepted Answer

Use combn to create all possible combinations.

combn(seq_along(all.null.data), 2, function(x) {
  cor.test(all.null.data[ ,x[1]], all.null.data[ ,x[2]], 
           alternative = "two.sided", method="pearson", 
           exact=NULL, conf.level=0.95, continuity=FALSE)$p.value
}) -> corr.results

corr.results

For 200 columns it will return 19900 values to you.

ncol(combn(1:200, 2))
#[1] 19900

How to pair all columns once with no repeats in R?

Answers (2)

Related Questions