Reputation: 75
I'd like to perform an independant t.test in a data frame
eyecolor suncream moles
1 blue x 10
2 blue x 9
3 blue x 6
4 blue y 15
5 blue y 7
6 blue y 3
7 brown x 9
8 brown x 6
9 brown x 4
10 brown y 1
11 brown y 2
12 brown y 1
That means 1. selecting according to eyecolor and 2. peform t.test for nr moles in suncream x vs y. I'm able to select with dplyr for mean, e.g.:
df %>% group_by(eyecolor, suncream) %>% summarize(moles.mean = mean(moles))
Just to make it clear, I would like to get a p-value comparing suncream x and y for every eycolor
Upvotes: 0
Views: 1220
Reputation: 4417
Don't make it too complicated with dplyr. It is not friendly to the formula interface of t.test which is very helpfull in this particular situation. HEITZ has given an dplyr answer. Compare how the version without dplyr is not only more idiomatic but even shorter an features less nested parentheses:
by(df, df$eyecolor, function(subs) t.test(subs$moles ~ subs$suncream))
or, if you really only want to see p-values;
by(df, df$eyecolor, function(subs) t.test(subs$moles ~ subs$suncream)$p.value)
Upvotes: 0
Reputation: 136
This should probably be treated in an ANOVA context. Also, the OP should take some time to digest fundamentals of null hypothesis testing and t-tests if the answer is not clear. That said, here is an answer:
results = df %>% group_by(eyecolor) %>% summarize(p = t.test(moles[which(suncream == 'x')],moles[which(suncream=='y')])$p.value)
Upvotes: 1