Reputation: 19
We have two parameters for multiple samples that we want to visualize in a dotplot. In our research group, people will commonly put confidence intervals on the figure (for example, a summary of biological replicates), in both the x and y direction.
I was looking into confidence ellipses (like the stat_ellipse function in R), which should do the same thing, but to my surprise, the intervals are much larger than when I make regular confidence intervals - even with the stringent bonferonni adjustment.
Am I misinterpreting the meaning of the ellipses, or are my regular confidence intervals wrong? What is the most appropriate way of including confidence intervals in 2D?
Reproducable example
Looking at the Sepal.Length and Petal.Length parameters from the iris dataset. The code below generates a plot with both confidence ellipses and regular confidence intervals (based on the t-test, bonferonni adjusted). I would naïvely expect the ellipses to be enclosed by the confidence intervals, but this is clearly not the case.
n_comparisons = 6 # 2*3 comparisons
# Generating bonferonni confidence intervals
# This uses the t.test function, setting the conf.level to the bonferonni adjusted value, and then extracting the confidence intervals with $conf.int
metrics = iris |>
pivot_longer(Sepal.Length:Petal.Length) |>
group_by(Species, name) |>
summarise(mean = mean(value),
upper_ci = t.test(value, conf.level = 1-(0.05/n_comparisons))$conf.int[2],
lower_ci = t.test(value, conf.level = 1-(0.05/n_comparisons))$conf.int[1]) |>
pivot_wider(values_from = mean:lower_ci)
ggplot(iris,
aes(x = Sepal.Length,
y = Petal.Length,
color = Species)) +
theme_classic() +
geom_point() +
stat_ellipse(level = 0.95,
type = "t") +
geom_errorbar(data = metrics,
aes(x = mean_Sepal.Length,
y = mean_Petal.Length,
ymin = lower_ci_Petal.Length,
ymax = upper_ci_Petal.Length),
width = 0.2) +
geom_errorbarh(data = metrics,
aes(x = mean_Sepal.Length,
y = mean_Petal.Length,
xmin = lower_ci_Sepal.Length,
xmax = upper_ci_Sepal.Length),
height = 0.2)
Upvotes: 0
Views: 107
Reputation: 132969
I believe you confuse standard error / confidence interval of the mean and standard deviation / quantiles of the population.
This looks reasonable:
library(dplyr)
library(tidyr)
n_comparisons = 6 # 2*3 comparisons
metrics = iris |>
pivot_longer(Sepal.Length:Petal.Length) |>
group_by(Species, name) |>
summarise(mean = mean(value),
upper_ci = mean(value) + qnorm(0.975) * sd(value),
lower_ci = mean(value) + qnorm(0.025) * sd(value)) |>
pivot_wider(values_from = mean:lower_ci)
ggplot(iris,
aes(x = Sepal.Length,
y = Petal.Length,
color = Species)) +
theme_classic() +
geom_point() +
stat_ellipse(level = 0.95,
type = "t") +
geom_errorbar(data = metrics,
aes(x = mean_Sepal.Length,
y = mean_Petal.Length,
ymin = lower_ci_Petal.Length,
ymax = upper_ci_Petal.Length),
width = 0.2) +
geom_errorbarh(data = metrics,
aes(x = mean_Sepal.Length,
y = mean_Petal.Length,
xmin = lower_ci_Sepal.Length,
xmax = upper_ci_Sepal.Length),
height = 0.2)
You should read help("dataEllipse", package = "car")
.
Upvotes: 3