TibialCuriosity
TibialCuriosity

Reputation: 47

T-tests across multiple columns or tidy the data

New to posting to Stack so apologies for any issues.

I'm learning to get more comfortable in R and currently looking at using broom/purr to run multiple stat tests at one time. An example of my current data looks like this:

Subject PreScoreTestA PostScoreTestA PreScoreTestB PostScoreTestB PreScoreTestC PostScoreTestC
1 30 40 6 8 12 10
2 15 12 9 13 7 7
3 20 22 11 12 9 10

But over many subjects and more tests. I want to do a dependent t-test to see scores changed over the course of a training program, but don't want to run a test for each score.

I've seen a couple examples of people using group by, nest, and map to run multiple t-tests, but their data was in a longer format

Is there a way to achieve the same goal while in a wide format? Or will I need to use pivot_longer to change the data.

Thanks in advance!

ETA had an edit here but was giving incorrect results and so have removed Still looking for some help on the arguments and same length

ETA Version 2

I did find a workaround using pairwise.t.test (code below). It gives the same p-values as doing t.test across individual assessments. I'm curious why it'd be working for pairwise.t.test but not t.test. Please let me know if anyone was any ideas!

    results <- testb %>%
     pivot_longer(-Subject, 
                   names_to = c("time", "test"), values_to = "score", 
                   names_pattern = "(Pre|Post)(.*)") %>%
     group_by(test) %>% 
     nest() %>% 
     mutate(ttests = map(.x=data, ~tidy(pairwise.t.test(.x$score, .x$time, paired = TRUE, p.adjust.method = "none")))) %>%  
     unnest(ttests)  

Upvotes: 0

Views: 1015

Answers (2)

TarJae
TarJae

Reputation: 79112

Here is a try without pivoting into long format: This again was finished with the help of the incredible akrun! See here: How to apply t.test() to multiple pairs of columns after mutate across:

df %>%
  summarise(across(starts_with('PreScore'), ~  t.test(.,
                                                      get(str_replace(cur_column(), "^PreScore", "PostScore")))$p.value, 
                   .names = "{.col}_TTest"))
  PreScoreTestA_TTest PreScoreTestB_TTest PreScoreTestC_TTest
1            0.767827            0.330604           0.8604162

Upvotes: 2

huttoncp
huttoncp

Reputation: 171

Yes, some pivoting is needed. Asssuming you have no directional hypotheses and you want to do a pre-post assessment for each test, this might be what you are looking for:

df <- as.data.frame(rbind(c(1,  30, 40, 6,  8,  12, 10),
                          c(2,  15, 12, 9,  13, 7,  7),
                          c(3,  20, 22, 11, 12, 9,  10)))

names(df) <- c("Subject",   
               "PrePushup", "PostPushup",   
               "PreRun",    "PostRun",  
               "PreJump",   "PostJump")

df %>% 
  pivot_longer(-Subject, 
               names_to = c("time", "test"), values_to = "score", 
               names_pattern = "(Pre|Post)(.*)") %>% 
  group_by(test) %>% 
  nest() %>% 
  mutate(t_tests = map(data, ~t.test(score ~ time, data = .x, paired = TRUE))) %>% 
  pull(t_tests) %>% 
  purrr::set_names(c("Pushup", "Run", "Jump"))

$Pushup

    Paired t-test

data:  score by time
t = 0.79241, df = 2, p-value = 0.5112
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -13.28958  19.28958
sample estimates:
mean of the differences 
                      3 


$Run

    Paired t-test

data:  score by time
t = 2.6458, df = 2, p-value = 0.1181
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.461250  6.127916
sample estimates:
mean of the differences 
               2.333333 


$Jump

    Paired t-test

data:  score by time
t = -0.37796, df = 2, p-value = 0.7418
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4.127916  3.461250
sample estimates:
mean of the differences 
             -0.3333333 

Upvotes: 3

Related Questions