Reputation: 135
So I can manually do t-tests when its between columns but how would it to do t-tests across rows? I have the following example dataframe to demonstrate what I mean by doing t-tests across rows.
Fruit | Sweetness Score |
---|---|
Apple | 8 |
Apple | 7 |
Apple | 8 |
Banana | 9 |
Banana | 10 |
Banana | 10 |
Banana | 10 |
Kiwi | 4 |
Kiwi | 5 |
Kiwi | 6 |
So how would I do a t-test to see if the mean sweetness of apples is different between bananas and kiwis? My actual data frame is 100+ rows long and has many more categories than just 3 but I want to figure it out for 3 items first row-wise. And is it possible to do t-tests automatically between all categories so Apples vs Bananas, Apples vs Kiwis, and Bananas vs Kiwis automatically without manually specifying the row names?
Upvotes: 1
Views: 72
Reputation: 10375
I would do an ANOVA combined with a Tukey HSD test, which is more robust then performing many t-tests (you should of course check that the ANOVA assumptions are true in your case).
mod=aov(SweetnessScore~Fruit,data=df)
summary(mod)
Df Sum Sq Mean Sq F value Pr(>F)
Fruit 2 38.68 19.342 39.63 0.000152 ***
Residuals 7 3.42 0.488
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Always check first if the variable as a whole is significant, and if true then
TukeyHSD(mod)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = SweetnessScore ~ Fruit, data = df)
$Fruit
diff lwr upr p adj
Banana-Apple 2.083333 0.5118688 3.6547979 0.0141688
Kiwi-Apple -2.666667 -4.3466329 -0.9867004 0.0055946
Kiwi-Banana -4.750000 -6.3214646 -3.1785354 0.0001165
Upvotes: 1