Abigail575
Abigail575

Reputation: 175

Calculating average pairwise Pearson Correlation Coefficients from Data Frame in R

Suppose I have the following vectors:

IDs_Complex_1 <- c("orangutan", "panda", "sloth", "mountain_gorilla", "dolphin", "snake")
IDs_Complex_2 <- c("bat", "penguin", "goat", "elephant", "tiger")

I would like to calculate the pairwise Pearson Correlation Coefficients between the values in the tissue column taken vertically, for each vector, in the following data frame. I then wish to find the average PCC of all possible combinations.

 Complex_ID        Tissue_X Tissue_Y Tissue_Z
 orangutan         5         6        7
 panda             6         7        8
 sloth             7         8        9
 mountain_gorilla  100       60       50
 dolphin           115       62       51
 snake             130       59       67
 bat               2         6        7
 penguin           15        11       12
 goat              22        23       86
 elephant          14        22       109
 tiger             0         1        7

So to illustrate this for complex 1, I wish to calculate:

  PCC_1 <- PCC of (5, 6, 7, 100, 115, 130) and (6, 7, 8, 60, 62, 59)
  PCC_2 <- PCC of (5, 6, 7, 100, 115, 130) and (7, 8, 9, 50, 51, 67)
  PCC_3 <- PCC of (6, 7, 8, 60, 62, 59) and (7, 8, 9, 50, 51, 67)

I Wish to compute the average of

  (PCC_1, PCC_2, PCC_3) = ?

But what if I have twenty or so tissue columns where there would be 20!/2!18! = 190 combinations (without repetition) of pairwise correlation coefficients. How would I code that?

Many thanks!

Abigail

Upvotes: 0

Views: 595

Answers (1)

StupidWolf
StupidWolf

Reputation: 46908

If df is your data.frame:

df = structure(list(Complex_ID = structure(c(6L, 7L, 9L, 5L, 2L, 10L, 
1L, 8L, 4L, 3L, 11L), .Label = c("bat", "dolphin", "elephant", 
"goat", "mountain_gorilla", "orangutan", "panda", "penguin", 
"sloth", "snake", "tiger"), class = "factor"), Tissue_X = c(5L, 
6L, 7L, 100L, 115L, 130L, 2L, 15L, 22L, 14L, 0L), Tissue_Y = c(6L, 
7L, 8L, 60L, 62L, 59L, 6L, 11L, 23L, 22L, 1L), Tissue_Z = c(7L, 
8L, 9L, 50L, 51L, 67L, 7L, 12L, 86L, 109L, 7L)), class = "data.frame", row.names = c(NA, 
-11L))

You can do:

    cor(df[,-1])
          Tissue_X  Tissue_Y  Tissue_Z
Tissue_X 1.0000000 0.9748668 0.4119840
Tissue_Y 0.9748668 1.0000000 0.5440719
Tissue_Z 0.4119840 0.5440719 1.0000000

Upvotes: 1

Related Questions