Arthur Pennt
Arthur Pennt

Reputation: 155

Multiple "One-Sample t-test" in R

I have a data.frame, which is similar to this one:

cb <- data.frame(group = ("A", "B", "C", "D", "E"), WC = runif(100, 0, 100), Ana = runif(100, 0, 100), Clo = runif(100, 0, 100))

Structure of the actual dataframe:

str(cb)
data.frame: 66936 obs of 89 variables: 
$group: Factor w/ 5 levels "A", "B", "C" ...
$WC: int 19 28 35 92 10 23...
$Ana: num 17.2 48 35.4 84.2
$ Clo: num 37.2 12.1 45.4 38.9
....

mean <- colMeans(cb[,2:89])
mean
WC     Ana    Clo    ...
52.45  37.23  50.12  ...

I want to perform One Sample t.tests on every group and every variable

For that I did the following:

A <- subset(cb, cb$group == "A")
B <- subset(cb, cb$group == "B")
...

t_A_WC <- t.test(A$WC, mu = mean[1], alternative = "two.sided")
t_B_WC <- t.test(B$WC, mu = mean[1], alternative = "two.sided")
....

t_A_Ana <- t.test(A$Ana, mu = mean[2], alternative = "two.sided")
t_B_Ana <- t.test(B$Ana, mu = mean[2], alternative = "two.sided")
....

t_A_Clo <- t.test(A$Clo, mu = mean[3], alternative = "two.sided")
t_B_Clo <- t.test(B$Clo, mu = mean[3], alternative = "two.sided")
....

The results are correct (or seem to be), but it is very time consuming typing the whole thing so many times.

Is there a smarter way to do that?

What I have tried:

From here

results <- lapply(mydf, t.test)
resultsmatrix <- do.call(cbind, results)
resultsmatrix[c("statistic","estimate","p.value"),]

But the results are somehow very wrong and does not fit to the values i calculated priorly.

EDIT:

Here is a link to a 10.000 row sample from the actual dataset

Upvotes: 0

Views: 2282

Answers (2)

nya
nya

Reputation: 2250

First, let's initialise a results matrix and group levels.

res <- matrix(NA, ncol=5, 
    dimnames=list(NULL, c("group", "col", "statistic", "estimate", "p.value")))
gr <- levels(cb$group)

Then we loop through all columns for which to calculate the t.test, subsetting each for every available group.

for(cl in 2:ncol(cb)){
    for(grp in gr){
        temp <- cb[cb$group == grp, cl]
        res <- rbind(res, c(grp, colnames(cb)[cl], 
            unlist(t.test(temp, mu = mean(cb[,cl]), alternative="two.sided"))[c(1, 5, 3)]))
    }
}

And finally, we reformat the results table.

res <- data.frame(res[-1,])

Upvotes: 1

carlo
carlo

Reputation: 131

this approach might be kind of lengthy. but i think it captures all the combination that you are looking for ("A" with "WC", "Ana", "Clo", "B" with "WC", "Ana", "Clo", etc.) So all in all 5 groups*3 variables = 15 t-test results.

cb <- data.frame(group = c("A", "B", "C", "D", "E"), WC = runif(100, 0, 100), Ana = runif(100, 0, 100), Clo = runif(100, 0, 100))

mean <- colMeans(cb[,2:4])
varNames <- names(cb)[-1]   # removing group variable from list of variables


# t-test results are stored in a list of list
master <- list()
i <- 1

  ## main for loop subsets; lapply calculates t-statistics for all variables in the subset
  for (group in unique(cb$group)){
    # create a list of t-test result in a given "group" subset
        results <- lapply((1:length(varNames)), FUN = function(x, subset = cb[cb$group == group,]) {
      t.test(subset[varNames[x]], mu = mean[x], alternative = "two.sided")
    })


    master[[group]] <- results
    i <- i + 1
  }

# so for example, if you want to find the results from group "A" and "WC" you say
master[["A"]][[1]]   # index one becaise "WC" is the first element of varNames

#   One Sample t-test
# 
# data:  subset[varNames[x]]
# t = -0.417, df = 19, p-value = 0.6813
# alternative hypothesis: true mean is not equal to 46.5857
# 95 percent confidence interval:
#  30.27709 57.47510
# sample estimates:
# mean of x 
#  43.87609 

# from there you can just find your relevant statistic, for example

master[["A"]][[1]]$statistic   # gives the t-statistic (eg. $statistic, $p.value, etc.)

#         t 
# -0.4170353

Upvotes: 1

Related Questions