Reputation: 2800
I have 44 samples where I only have its sample size, average and 1 standard deviation. I asked about the possibility of doing a t.test and some of you returned the answer:
T.test <- function(n, mean, sd) {
s <- sum((n - 1) * sd^2) / (sum(n) - 2) # weighted variance
t <- sqrt(prod(n) / sum(n)) * (diff(mean) / sqrt(s)) # t statistic
df <- sum(n) - 2 # degrees of freedom
p <- (1 - pt(abs(t), df)) * 2 # p value
c(t = t, p = p)
}
dat <- data.frame(mean = c(6.62, 6.31), sd = c(.52, .49), n = c(10, 12))
# mean sd n
# 1 6.62 0.52 10
# 2 6.31 0.49 12
T.test(dat$n, dat$mean, dat$sd)
# t p
# -1.4373111 0.1660915
However I would like to perform a t.test analysis to every single pair of samples. As I have 44 samples, it would be a very hard task.
Imagine I have 5 samples:
# mean sd n
# 1 6.62 0.52 10
# 2 6.31 0.49 12
# 3 5.95 0.76 34
# 4 5.67 0.56 23
# 5 6.12 0.35 16
I would like to perfom a t.test between 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, 4-5 and get a table with all the results obtained at the same time.
The obtained table would be with the name of the samples in the first row and in the first column, so half of the table would be redundant. Both results (t.test and p-value) must appear. It should be something similar to this:
# 1 2 3 4 5
# 1 - test1 p-value1 test2 p-value2 test3 p-value3 test4 p-value4
# 2 - - test5 p-value5 test6 p-value6 test7 p-value7
# 3 - - - test8 p-value8 test9 p-value9
# 4 - - - - test0 p-value0
# 5 - - - - -
Can anybody of you show me how could be the code in R to obtain what I request automatically using data written above? I could then update it to my full samples.
Upvotes: 1
Views: 461
Reputation: 973
We can use the pairwise.t.test
function which also provides different p-value adjustments for pairwise comparisons. This is needed, because among the many t.tests you will encounter significant differences just by chance.
We first set up the original data.frame
and a data.frame
to be filled.
df = data.frame(n = c(10,12,34,23,16), mean = c(6.62,6.31,5.95,5.67,6.12), sd = c(0.52,0.49,0.76,0.56,0.35))
sample_distributions = data.frame(value = numeric(0), sample = integer(0))
We then use the values in df
to build normal distributions with the provided parameters and append them on sample_distributions
.
for(i in 1:nrow(df)){
values = rnorm(df$n[i], df$mean[i], df$sd[i])
sample= rep(i, length(values))
sample_distributions = rbind(sample_distributions, data.frame(values,sample))
}
Finally, we use these distributions to perform pairwise t.tests.
pairwise.t.test(x = sample_distributions$values, g = sample_distributions$sample, p.adjust.method = "bonferroni")
which yields:
Pairwise comparisons using t tests with pooled SD
data: sample_distributions$values and sample_distributions$sample
1 2 3 4
2 1.0000 - - -
3 0.0051 0.1524 - -
4 0.0099 0.2309 1.0000 -
5 0.9955 1.0000 0.4172 0.6055
P value adjustment method: bonferroni
Upvotes: 2
Reputation: 887821
We could use outer
to do the T.test
on all the combinations of rows.
res <- outer(1:nrow(dat), 1:nrow(dat), FUN=Vectorize(function(i,j) {
x1 <- dat[c(i,j), ]
T.test(x1$n, x1$mean, x1$sd)[[2]]}))
If we need only the upper triangular p-values, we can assign the lower.tri
elements to NA
.
res[lower.tri(res, diag=TRUE)] <- NA
res
# [,1] [,2] [,3] [,4] [,5]
#[1,] NA 0.1660915 0.01270188 7.317558e-05 0.007149738
#[2,] NA NA 0.13401244 2.075498e-03 0.241424622
#[3,] NA NA NA 1.368773e-01 0.399642479
#[4,] NA NA NA NA 0.007203030
#[5,] NA NA NA NA NA
This could be also done using combn
to return a vector
of 'p values'
v1 <- combn(1:nrow(dat), 2, FUN=function(i) {
x1 <- dat[i,]
T.test(x1$n, x1$mean, x1$sd)})[2,]
If we need the matrix
output, we can create a matrix
with NA values
res2 <- matrix(NA, 5, 5)
then fill the elements in the matrix
based on the logical index returned from lower.tri
.
res2[lower.tri(res2, diag=FALSE)] <- v1
and transpose (t
) to return the same output as in outer
.
t(res2)
# [,1] [,2] [,3] [,4] [,5]
#[1,] NA 0.1660915 0.01270188 7.317558e-05 0.007149738
#[2,] NA NA 0.13401244 2.075498e-03 0.241424622
#[3,] NA NA NA 1.368773e-01 0.399642479
#[4,] NA NA NA NA 0.007203030
#[5,] NA NA NA NA NA
dat <- structure(list(mean = c(6.62, 6.31, 5.95, 5.67, 6.12),
sd = c(0.52,
0.49, 0.76, 0.56, 0.35), n = c(10L, 12L, 34L, 23L, 16L)),
.Names = c("mean",
"sd", "n"), class = "data.frame", row.names = c("1", "2", "3",
"4", "5"))
Upvotes: 2