Michiel Schreurs
Michiel Schreurs

Reputation: 19

What is the most appropriate way to show confidence intervals in 2D?

We have two parameters for multiple samples that we want to visualize in a dotplot. In our research group, people will commonly put confidence intervals on the figure (for example, a summary of biological replicates), in both the x and y direction.

I was looking into confidence ellipses (like the stat_ellipse function in R), which should do the same thing, but to my surprise, the intervals are much larger than when I make regular confidence intervals - even with the stringent bonferonni adjustment.

Am I misinterpreting the meaning of the ellipses, or are my regular confidence intervals wrong? What is the most appropriate way of including confidence intervals in 2D?

Reproducable example

Looking at the Sepal.Length and Petal.Length parameters from the iris dataset. The code below generates a plot with both confidence ellipses and regular confidence intervals (based on the t-test, bonferonni adjusted). I would naïvely expect the ellipses to be enclosed by the confidence intervals, but this is clearly not the case.

n_comparisons = 6 # 2*3 comparisons

# Generating bonferonni confidence intervals
# This uses the t.test function, setting the conf.level to the bonferonni adjusted value, and then extracting the confidence intervals with $conf.int
metrics = iris |> 
  pivot_longer(Sepal.Length:Petal.Length) |> 
  group_by(Species, name) |> 
  summarise(mean = mean(value),
            upper_ci = t.test(value, conf.level = 1-(0.05/n_comparisons))$conf.int[2],
            lower_ci = t.test(value, conf.level = 1-(0.05/n_comparisons))$conf.int[1]) |> 
  pivot_wider(values_from = mean:lower_ci)


ggplot(iris,
       aes(x = Sepal.Length,
           y = Petal.Length,
           color = Species)) + 
  theme_classic() + 
  geom_point() + 
  stat_ellipse(level = 0.95,
               type = "t") + 
  geom_errorbar(data = metrics,
                aes(x = mean_Sepal.Length,
                    y = mean_Petal.Length,
                    ymin = lower_ci_Petal.Length,
                    ymax = upper_ci_Petal.Length),
                width = 0.2) +
  geom_errorbarh(data = metrics,
                aes(x = mean_Sepal.Length,
                    y = mean_Petal.Length,
                    xmin = lower_ci_Sepal.Length,
                    xmax = upper_ci_Sepal.Length),
                height = 0.2)

The Iris dataset with confidence ellipses and bonferonni-corrected confidence intervals

Upvotes: 0

Views: 107

Answers (1)

Roland
Roland

Reputation: 132969

I believe you confuse standard error / confidence interval of the mean and standard deviation / quantiles of the population.

This looks reasonable:

library(dplyr)
library(tidyr)

n_comparisons = 6 # 2*3 comparisons

metrics = iris |> 
  pivot_longer(Sepal.Length:Petal.Length) |> 
  group_by(Species, name) |> 
  summarise(mean = mean(value),
            upper_ci = mean(value) + qnorm(0.975) * sd(value),
            lower_ci = mean(value) + qnorm(0.025) * sd(value)) |> 
  pivot_wider(values_from = mean:lower_ci)


ggplot(iris,
       aes(x = Sepal.Length,
           y = Petal.Length,
           color = Species)) + 
  theme_classic() + 
  geom_point() + 
  stat_ellipse(level = 0.95,
               type = "t") + 
  geom_errorbar(data = metrics,
                aes(x = mean_Sepal.Length,
                    y = mean_Petal.Length,
                    ymin = lower_ci_Petal.Length,
                    ymax = upper_ci_Petal.Length),
                width = 0.2) +
  geom_errorbarh(data = metrics,
                 aes(x = mean_Sepal.Length,
                     y = mean_Petal.Length,
                     xmin = lower_ci_Sepal.Length,
                     xmax = upper_ci_Sepal.Length),
                 height = 0.2)

Resulting scater plot with data ellipses and error bars

You should read help("dataEllipse", package = "car").

Upvotes: 3

Related Questions