t.test applied in pairs to more than two samples in R

Question

I have 44 samples where I only have its sample size, average and 1 standard deviation. I asked about the possibility of doing a t.test and some of you returned the answer:

T.test <- function(n, mean, sd) {
  s <- sum((n - 1) * sd^2) / (sum(n) - 2) # weighted variance
  t <- sqrt(prod(n) / sum(n)) * (diff(mean) / sqrt(s)) # t statistic
  df <- sum(n) - 2  # degrees of freedom
  p <- (1 - pt(abs(t), df)) * 2 # p value
  c(t = t, p = p)
}

dat <- data.frame(mean = c(6.62, 6.31), sd = c(.52, .49), n = c(10, 12))
#   mean   sd  n
# 1 6.62 0.52 10
# 2 6.31 0.49 12

T.test(dat$n, dat$mean, dat$sd)
#          t          p 
# -1.4373111  0.1660915

However I would like to perform a t.test analysis to every single pair of samples. As I have 44 samples, it would be a very hard task.

Imagine I have 5 samples:

#   mean   sd  n
# 1 6.62 0.52 10
# 2 6.31 0.49 12
# 3 5.95 0.76 34
# 4 5.67 0.56 23
# 5 6.12 0.35 16

I would like to perfom a t.test between 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, 4-5 and get a table with all the results obtained at the same time.

The obtained table would be with the name of the samples in the first row and in the first column, so half of the table would be redundant. Both results (t.test and p-value) must appear. It should be something similar to this:

#   1 2              3              4              5
# 1 - test1 p-value1 test2 p-value2 test3 p-value3 test4 p-value4
# 2 - -              test5 p-value5 test6 p-value6 test7 p-value7
# 3 - -              -              test8 p-value8 test9 p-value9
# 4 - -              -              -              test0 p-value0
# 5 - -              -              -              -

Can anybody of you show me how could be the code in R to obtain what I request automatically using data written above? I could then update it to my full samples.

ChrKoenig · Accepted Answer

We can use the pairwise.t.test function which also provides different p-value adjustments for pairwise comparisons. This is needed, because among the many t.tests you will encounter significant differences just by chance.

We first set up the original data.frame and a data.frame to be filled.

df = data.frame(n = c(10,12,34,23,16), mean = c(6.62,6.31,5.95,5.67,6.12), sd = c(0.52,0.49,0.76,0.56,0.35))
sample_distributions = data.frame(value = numeric(0), sample = integer(0))

We then use the values in df to build normal distributions with the provided parameters and append them on sample_distributions.

for(i in 1:nrow(df)){
  values = rnorm(df$n[i], df$mean[i], df$sd[i])
  sample= rep(i, length(values))
  sample_distributions = rbind(sample_distributions, data.frame(values,sample))
}

Finally, we use these distributions to perform pairwise t.tests.

pairwise.t.test(x = sample_distributions$values, g = sample_distributions$sample, p.adjust.method = "bonferroni")

which yields:

Pairwise comparisons using t tests with pooled SD 

data:  sample_distributions$values and sample_distributions$sample 

  1      2      3      4     
2 1.0000 -      -      -     
3 0.0051 0.1524 -      -     
4 0.0099 0.2309 1.0000 -     
5 0.9955 1.0000 0.4172 0.6055

P value adjustment method: bonferroni

t.test applied in pairs to more than two samples in R

Answers (2)

data

Related Questions