correlation between genepaires in two groups with different sample sizes

Question

I have two groups of patients which contains same dim for genes but different dim for patients (samples). Different samples in each group is a biological replication.

sample1 <- structure(c(3.990406045, 4.041745321, 4.002401404, 4.031463584, 
4.046886189, 4.00582865, 3.985265177, 3.9869788, 3.995546913, 
4.00582865, 11.75549075, 11.81394311, 11.81826206, 11.76013913, 
11.8408451, 11.83619671, 11.72858876, 11.73755609, 11.78239274, 
11.83619671, 8.647734791, 8.606480387, 8.64648886, 8.607548328, 
8.605946416, 8.646132879, 8.648268762, 8.648090771, 8.647200821, 
8.646132879, 5.359884744, 5.371302287, 5.37638989, 5.357155019, 
5.378375921, 5.381105646, 5.35281111, 5.355168988, 5.366958378, 
5.381105646, 8.805045323, 8.684889613, 8.794736874, 8.693725426, 
8.680471706, 8.791791603, 8.80946323, 8.807990594, 8.800627416, 
8.791791603, 10.87587031, 10.85539252, 10.87095037, 10.85960961, 
10.85328398, 10.86954467, 10.87797885, 10.877276, 10.87376176, 
10.86954467, 5.505422817, 5.530799682, 5.631682175, 5.422577376, 
5.584910836, 5.667756277, 5.451311664, 5.469348715, 5.559533971, 
5.667756277), .Dim = c(10L, 7L), .Dimnames = list(c("patient1", 
"patient2", "patient3", "patient4", 
"patient5", "patient6", "patient7", 
"patient8", "patient9", "patient10"
), c("gene1", "gene2", "gene3", "gene4", "gene5", "gene6", "gene7"
)))

and

sample2 <- structure(c(3.990406045, 4.041745321, 4.002401404, 4.031463584, 
4.046886189, 4.00582865, 3.985265177, 3.9869788, 11.75549075, 
11.81394311, 11.81826206, 11.76013913, 11.8408451, 11.83619671, 
11.72858876, 11.73755609, 8.647734791, 8.606480387, 8.64648886, 
8.607548328, 8.605946416, 8.646132879, 8.648268762, 8.648090771, 
5.359884744, 5.371302287, 5.37638989, 5.357155019, 5.378375921, 
5.381105646, 5.35281111, 5.355168988, 8.805045323, 8.684889613, 
8.794736874, 8.693725426, 8.680471706, 8.791791603, 8.80946323, 
8.807990594, 10.87587031, 10.85539252, 10.87095037, 10.85960961, 
10.85328398, 10.86954467, 10.87797885, 10.877276, 5.505422817, 
5.530799682, 5.631682175, 5.422577376, 5.584910836, 5.667756277, 
5.451311664, 5.469348715), .Dim = c(8L, 7L), .Dimnames = list(
c("patient1", 
"patient2", "patient3", "patient4", 
"patient5", "patient6", "patient7", 
"patient8"), c("gene1", "gene2", "gene3", "gene4", "gene5", "gene6", "gene7")))

Now, I want to check the correlation between gene pairs between two groups

rcorr(sample1, sample2, type="s")#spearman

I reveive:

Error in cbind(x, y) : number of rows of matrices must match (see arg 2)

but for correlation of patients t(samples) brings the correlation between patient pairs. I need correlations between gene pairs (below). Is there anything wrong? Should I consider some statistical points?

when length of patients are equal I reverie this:

> rcorr(sample1[1:8, ], sample2[1:8,], type="s")
      gene1 gene2 gene3 gene4 gene5 gene6 gene7 gene1 gene2 gene3 gene4 gene5 gene6 gene7
gene1  1.00  0.81 -1.00  0.67 -1.00 -1.00  0.36  1.00  0.81 -1.00  0.67 -1.00 -1.00  0.36
gene2  0.81  1.00 -0.81  0.95 -0.81 -0.81  0.79  0.81  1.00 -0.81  0.95 -0.81 -0.81  0.79
gene3 -1.00 -0.81  1.00 -0.67  1.00  1.00 -0.36 -1.00 -0.81  1.00 -0.67  1.00  1.00 -0.36
gene4  0.67  0.95 -0.67  1.00 -0.67 -0.67  0.90  0.67  0.95 -0.67  1.00 -0.67 -0.67  0.90
gene5 -1.00 -0.81  1.00 -0.67  1.00  1.00 -0.36 -1.00 -0.81  1.00 -0.67  1.00  1.00 -0.36
gene6 -1.00 -0.81  1.00 -0.67  1.00  1.00 -0.36 -1.00 -0.81  1.00 -0.67  1.00  1.00 -0.36
gene7  0.36  0.79 -0.36  0.90 -0.36 -0.36  1.00  0.36  0.79 -0.36  0.90 -0.36 -0.36  1.00
gene1  1.00  0.81 -1.00  0.67 -1.00 -1.00  0.36  1.00  0.81 -1.00  0.67 -1.00 -1.00  0.36
gene2  0.81  1.00 -0.81  0.95 -0.81 -0.81  0.79  0.81  1.00 -0.81  0.95 -0.81 -0.81  0.79
gene3 -1.00 -0.81  1.00 -0.67  1.00  1.00 -0.36 -1.00 -0.81  1.00 -0.67  1.00  1.00 -0.36
gene4  0.67  0.95 -0.67  1.00 -0.67 -0.67  0.90  0.67  0.95 -0.67  1.00 -0.67 -0.67  0.90
gene5 -1.00 -0.81  1.00 -0.67  1.00  1.00 -0.36 -1.00 -0.81  1.00 -0.67  1.00  1.00 -0.36
gene6 -1.00 -0.81  1.00 -0.67  1.00  1.00 -0.36 -1.00 -0.81  1.00 -0.67  1.00  1.00 -0.36
gene7  0.36  0.79 -0.36  0.90 -0.36 -0.36  1.00  0.36  0.79 -0.36  0.90 -0.36 -0.36  1.00

As is seen there is a duplication in matrix, too. Why?

Marat Talipov · Accepted Answer

Regarding the last portion of your question: rcorr binds matrices sample1 and sample2 by columns and uses the combined matrix to calculate rank correlation coefficients. If you gave different names to the genes in sample1 and sample2, e.g.:

colnames(sample1) <- sprintf('sample1.%s',colnames(sample1))
colnames(sample2) <- sprintf('sample2.%s',colnames(sample2))

you would see that you have a block matrix with the diagonal blocks corresponding to the coefficients inside each sample (sample1-sample1 and sample2-sample2), and the off-diagonal blocks--to the coefficients between sample1 and sample2.

rcorr(sample1[1:8,],sample2[1:8,],type='s')

              sample1.gene1 sample1.gene2 sample1.gene3 sample1.gene4 sample1.gene5 sample1.gene6 sample1.gene7 sample2.gene1 sample2.gene2 sample2.gene3 sample2.gene4
sample1.gene1          1.00          0.81         -1.00          0.67         -1.00         -1.00          0.36          1.00          0.81         -1.00          0.67
sample1.gene2          0.81          1.00         -0.81          0.95         -0.81         -0.81          0.79          0.81          1.00         -0.81          0.95
sample1.gene3         -1.00         -0.81          1.00         -0.67          1.00          1.00         -0.36         -1.00         -0.81          1.00         -0.67
sample1.gene4          0.67          0.95         -0.67          1.00         -0.67         -0.67          0.90          0.67          0.95         -0.67          1.00
sample1.gene5         -1.00         -0.81          1.00         -0.67          1.00          1.00         -0.36         -1.00         -0.81          1.00         -0.67
sample1.gene6         -1.00         -0.81          1.00         -0.67          1.00          1.00         -0.36         -1.00         -0.81          1.00         -0.67
sample1.gene7          0.36          0.79         -0.36          0.90         -0.36         -0.36          1.00          0.36          0.79         -0.36          0.90
sample2.gene1          1.00          0.81         -1.00          0.67         -1.00         -1.00          0.36          1.00          0.81         -1.00          0.67
sample2.gene2          0.81          1.00         -0.81          0.95         -0.81         -0.81          0.79          0.81          1.00         -0.81          0.95
sample2.gene3         -1.00         -0.81          1.00         -0.67          1.00          1.00         -0.36         -1.00         -0.81          1.00         -0.67
sample2.gene4          0.67          0.95         -0.67          1.00         -0.67         -0.67          0.90          0.67          0.95         -0.67          1.00
sample2.gene5         -1.00         -0.81          1.00         -0.67          1.00          1.00         -0.36         -1.00         -0.81          1.00         -0.67
sample2.gene6         -1.00         -0.81          1.00         -0.67          1.00          1.00         -0.36         -1.00         -0.81          1.00         -0.67
sample2.gene7          0.36          0.79         -0.36          0.90         -0.36         -0.36          1.00          0.36          0.79         -0.36          0.90
              sample2.gene5 sample2.gene6 sample2.gene7
sample1.gene1         -1.00         -1.00          0.36
sample1.gene2         -0.81         -0.81          0.79
sample1.gene3          1.00          1.00         -0.36
sample1.gene4         -0.67         -0.67          0.90
sample1.gene5          1.00          1.00         -0.36
sample1.gene6          1.00          1.00         -0.36
sample1.gene7         -0.36         -0.36          1.00
sample2.gene1         -1.00         -1.00          0.36
sample2.gene2         -0.81         -0.81          0.79
sample2.gene3          1.00          1.00         -0.36
sample2.gene4         -0.67         -0.67          0.90
sample2.gene5          1.00          1.00         -0.36
sample2.gene6          1.00          1.00         -0.36
sample2.gene7         -0.36         -0.36          1.00

It happened that sample1 and sample2 are identical in your example, so that is why you have all block matrices equal.

UPDATE: sample1-sample2 correlations can be calculated using cor function:

library(reshape2)

# produce all combinations of column indices for sample1 and sample2
z <- expand.grid(s1=1:7,s2=1:7) 

# due to the correlation matrix symmetry, we can calculate only an upper right trigonal matrix
z <- z[z$s2

correlation between genepaires in two groups with different sample sizes

Answers (2)

Related Questions