Reputation: 67
I have a data frame of many numeric variables for 60 participants. each participant has two values of each variable (before intervention and during intervention). I'd like to run paired t.test on each variable in this data frame
####data frame look like
Log.Name fat protein carbs
before R 19 32 134
during R 21 43 167
before R 32 14 322
during R 25 32 213
before R 42 34 201
during R 34 23 305
I tried different approaches
qw<- matrix(lapply(names(new.averages)[-1], function(x){
t.test(new.averages[new.averages$Log.Name =="before R", x],
new.averages[new.averages$Log.Name=="during R", x], mu=0, alt="two.sided", paired = F)$p.value}))
this didn't work but if I change paired to be False, it works !! but if Paired=True it throughs the following error
( Error in t.test.default(new.averages[new.averages$Log.Name == "before R", : not enough 'x' observations )
lapply(new.averages[-1], function(x) t.test(x ~ new.averages$Log.Name, paired=F)$p.value)
this one also works when paired=F but when paired=F, it throughs the following error
Error in complete.cases(x, y) : not all arguments have the same length
when I run individuals paired t.test it works, but then I will spend hours doing many tests while I should do it by one click!!
any idea?
Upvotes: 1
Views: 784
Reputation: 3923
You can use the formula interface and then lapply
or map
library(purrr)
# first a single case
t.test(fat ~ Log.Name, data = df, paired = TRUE)
#>
#> Paired t-test
#>
#> data: fat by Log.Name
#> t = 1.3628, df = 2, p-value = 0.3061
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -9.34823 18.01490
#> sample estimates:
#> mean of the differences
#> 4.333333
# then build a named vector of all the variables you want to test
tobetested <- names(df[-1])
names(tobetested) <- names(df[-1])
# you can use paste to build the formula on the fly
map(tobetested,
~ t.test(as.formula(paste(., "~ Log.Name")),
data = df,
paired = TRUE))
#> $fat
#>
#> Paired t-test
#>
#> data: fat by Log.Name
#> t = 1.3628, df = 2, p-value = 0.3061
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -9.34823 18.01490
#> sample estimates:
#> mean of the differences
#> 4.333333
#>
#>
#> $protein
#>
#> Paired t-test
#>
#> data: protein by Log.Name
#> t = -0.68674, df = 2, p-value = 0.5632
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -43.59182 31.59182
#> sample estimates:
#> mean of the differences
#> -6
#>
#>
#> $carbs
#>
#> Paired t-test
#>
#> data: carbs by Log.Name
#> t = -0.14906, df = 2, p-value = 0.8952
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -278.7487 260.0821
#> sample estimates:
#> mean of the differences
#> -9.333333
Your data
library(readr)
df <- read_table("Log.Name fat protein carbs
before R 19 32 134
during R 21 43 167
before R 32 14 322
during R 25 32 213
before R 42 34 201
during R 34 23 305")
Upvotes: 0
Reputation: 174478
You can pull each column of interest out of the data frame and compare the elements with an odd index to those with an even index if that's how your data are laid out:
lapply(new.averages[-1], function(x) {
t.test(x[seq_along(x) %% 2 == 1],
x[seq_along(x) %% 2 == 0], paired = TRUE)$p.value
})
#> $fat
#> [1] 0.3061113
#>
#> $protein
#> [1] 0.5631788
#>
#> $carbs
#> [1] 0.8951818
Upvotes: 1