Reputation: 23
I have data showing beta values (neural activity) as a continuous dv and a categorical iv (Position) with several levels (1, 2, 3). I have created a summary dataframe that includes the means for each condition.
df_sum <- df %>%
group_by(Position) %>%
summarise(
n=n(),
mean=mean(Beta),
sd=sd(Beta)
) %>%
mutate( se=sd/sqrt(n)) %>%
mutate( ic=se * qt((1-0.05)/2 + .5, n-1))
This is a reanalysis of some old data originally analysed in SPSS, and the means from the summarise function agree with those previously reported. However, when I run t-tests with either emmeans
emmeans((lm(Beta ~ Position, data = df)), pairwise ~ Position)
or t.test
t.test(df$Beta[df$Position=="1"], df$Beta[df$Position=="2"])
the output reports means that are the same as each other but slightly different to the ones calculated by summarise and reported by the original study. Why is this happening (or really, how??), and which should I use?
Upvotes: 1
Views: 180
Reputation: 6830
I suspect that you are seeing means that are identical, but the confidence intervals for them differ. And that's because the emmeans
results are based on the model fitted by lm
, which estimates a common error SD, with more degrees of freedom. Either set of results is fine, but they are based in different models. If he SDs don't differ statistically, you are better off with the lm
results because they make more efficient use of the data.
You can reproduce the SEs provided by emmeans
as follows:
df_sum %>%
mutate(sp = sqrt(sum((n-1)*sd^2) / sum(n-1))) %>%
group_by(Position) %>%
mutate(se.p = sp/sqrt(n))
In these steps, we
sp
via a weighted average of the variancesse.p
, the SE based on the pooled sUpvotes: 2