antecessor
antecessor

Reputation: 2800

t.test applied in pairs to more than two samples in R

I have 44 samples where I only have its sample size, average and 1 standard deviation. I asked about the possibility of doing a t.test and some of you returned the answer:

T.test <- function(n, mean, sd) {
  s <- sum((n - 1) * sd^2) / (sum(n) - 2) # weighted variance
  t <- sqrt(prod(n) / sum(n)) * (diff(mean) / sqrt(s)) # t statistic
  df <- sum(n) - 2  # degrees of freedom
  p <- (1 - pt(abs(t), df)) * 2 # p value
  c(t = t, p = p)
}

dat <- data.frame(mean = c(6.62, 6.31), sd = c(.52, .49), n = c(10, 12))
#   mean   sd  n
# 1 6.62 0.52 10
# 2 6.31 0.49 12

T.test(dat$n, dat$mean, dat$sd)
#          t          p 
# -1.4373111  0.1660915 

However I would like to perform a t.test analysis to every single pair of samples. As I have 44 samples, it would be a very hard task.

Imagine I have 5 samples:

#   mean   sd  n
# 1 6.62 0.52 10
# 2 6.31 0.49 12
# 3 5.95 0.76 34
# 4 5.67 0.56 23
# 5 6.12 0.35 16

I would like to perfom a t.test between 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, 4-5 and get a table with all the results obtained at the same time.

The obtained table would be with the name of the samples in the first row and in the first column, so half of the table would be redundant. Both results (t.test and p-value) must appear. It should be something similar to this:

#   1 2              3              4              5
# 1 - test1 p-value1 test2 p-value2 test3 p-value3 test4 p-value4
# 2 - -              test5 p-value5 test6 p-value6 test7 p-value7
# 3 - -              -              test8 p-value8 test9 p-value9
# 4 - -              -              -              test0 p-value0
# 5 - -              -              -              -

Can anybody of you show me how could be the code in R to obtain what I request automatically using data written above? I could then update it to my full samples.

Upvotes: 1

Views: 461

Answers (2)

ChrKoenig
ChrKoenig

Reputation: 973

We can use the pairwise.t.test function which also provides different p-value adjustments for pairwise comparisons. This is needed, because among the many t.tests you will encounter significant differences just by chance.

We first set up the original data.frame and a data.frame to be filled.

df = data.frame(n = c(10,12,34,23,16), mean = c(6.62,6.31,5.95,5.67,6.12), sd = c(0.52,0.49,0.76,0.56,0.35))
sample_distributions = data.frame(value = numeric(0), sample = integer(0))

We then use the values in df to build normal distributions with the provided parameters and append them on sample_distributions.

for(i in 1:nrow(df)){
  values = rnorm(df$n[i], df$mean[i], df$sd[i])
  sample= rep(i, length(values))
  sample_distributions = rbind(sample_distributions, data.frame(values,sample))
}

Finally, we use these distributions to perform pairwise t.tests.

pairwise.t.test(x = sample_distributions$values, g = sample_distributions$sample, p.adjust.method = "bonferroni")

which yields:

Pairwise comparisons using t tests with pooled SD 

data:  sample_distributions$values and sample_distributions$sample 

  1      2      3      4     
2 1.0000 -      -      -     
3 0.0051 0.1524 -      -     
4 0.0099 0.2309 1.0000 -     
5 0.9955 1.0000 0.4172 0.6055

P value adjustment method: bonferroni 

Upvotes: 2

akrun
akrun

Reputation: 887821

We could use outer to do the T.test on all the combinations of rows.

res <-  outer(1:nrow(dat), 1:nrow(dat), FUN=Vectorize(function(i,j) {
           x1 <- dat[c(i,j), ]
           T.test(x1$n, x1$mean, x1$sd)[[2]]}))

If we need only the upper triangular p-values, we can assign the lower.tri elements to NA.

res[lower.tri(res, diag=TRUE)] <- NA
res
#    [,1]      [,2]       [,3]         [,4]        [,5]
#[1,]   NA 0.1660915 0.01270188 7.317558e-05 0.007149738
#[2,]   NA        NA 0.13401244 2.075498e-03 0.241424622
#[3,]   NA        NA         NA 1.368773e-01 0.399642479
#[4,]   NA        NA         NA           NA 0.007203030
#[5,]   NA        NA         NA           NA          NA

This could be also done using combn to return a vector of 'p values'

v1 <- combn(1:nrow(dat), 2, FUN=function(i) {
         x1 <- dat[i,]
      T.test(x1$n, x1$mean, x1$sd)})[2,]

If we need the matrix output, we can create a matrix with NA values

res2 <- matrix(NA, 5, 5)

then fill the elements in the matrix based on the logical index returned from lower.tri.

res2[lower.tri(res2, diag=FALSE)] <- v1

and transpose (t) to return the same output as in outer.

t(res2)
#     [,1]      [,2]       [,3]         [,4]        [,5]
#[1,]   NA 0.1660915 0.01270188 7.317558e-05 0.007149738
#[2,]   NA        NA 0.13401244 2.075498e-03 0.241424622
#[3,]   NA        NA         NA 1.368773e-01 0.399642479
#[4,]   NA        NA         NA           NA 0.007203030
#[5,]   NA        NA         NA           NA          NA

data

dat <- structure(list(mean = c(6.62, 6.31, 5.95, 5.67, 6.12), 
sd = c(0.52, 
0.49, 0.76, 0.56, 0.35), n = c(10L, 12L, 34L, 23L, 16L)), 
.Names = c("mean", 
"sd", "n"), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5"))

Upvotes: 2

Related Questions