Katie Nissen
Katie Nissen

Reputation: 79

How to pair all columns once with no repeats in R?

I have a dataset with 200 columns and 1000 rows of observations for each column. I am trying to find the correlation between each column, with no repeats. So, for example, column 1 & 2, column 1 & 3, column 2 & 3, but NOT column 3 & 1 because that is the same as the first pairing. Mathematically, I should have 19900 pairs of columns, but I can't figure out how to get that. The code I have so far is below:

corr.results<- rep(NA,19900)
for(i in 1:19900)
  {
  column1<- i
  column2<- i+1
  
  results<- cor.test(all.null.data[ ,column1], all.null.data[ ,column2], 
                          alternative = "two.sided", method="pearson", 
                          exact=NULL, conf.level=0.95, continuity=FALSE)
  corr.results[i]<- results$p.value
}

View(corr.results)

Obviously, this is incorrect because I am only doing adjacent pairs (e.g. 1&2, 2&3, 3&4, etc.), but it's all I've got so far.

Upvotes: 1

Views: 127

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

Use combn to create all possible combinations.

combn(seq_along(all.null.data), 2, function(x) {
  cor.test(all.null.data[ ,x[1]], all.null.data[ ,x[2]], 
           alternative = "two.sided", method="pearson", 
           exact=NULL, conf.level=0.95, continuity=FALSE)$p.value
}) -> corr.results

corr.results

For 200 columns it will return 19900 values to you.

ncol(combn(1:200, 2))
#[1] 19900

Upvotes: 1

user2974951
user2974951

Reputation: 10375

Using your example with a (double) loop, and mtcars toy dataset

res=list()
for (i in 1:(ncol(mtcars)-1)) {
  for (j in (i+1):ncol(mtcars)) {
    res=c(
      res,
      list(c(i,j,cor.test(mtcars[,i],mtcars[,j])$p.value))
    )
  }
}
res=do.call(rbind,res)
colnames(res)=c("i","j","p")

Upvotes: 1

Related Questions