EGM8686
EGM8686

Reputation: 1572

Median statistical difference in ggplot

I have a ggplot boxplot like this one:

library(ggplot2)
data(iris)
ggplot(iris, aes(x = "", y = Sepal.Width)) +
    geom_boxplot()

As you can see the median is 3. Say the real value is 3.8 what I would like to know is if there's a statistical difference among the real value 3.8 and the observed value of 3, so what statistical difference method should I use? Can I implement this in R. Also is it possible to plot the real value of 3.8 in the plot?

Thx!

PS: I´m using the iris dataset as an easily reproducible example for my real data.

Upvotes: 2

Views: 797

Answers (2)

Adam B.
Adam B.

Reputation: 1180

Another viable option is bootstrapping.

When you bootstrap, you draw many random samples from your original sample with replacement (meaning that an individual observation from your sample can appear more than once in some of your bootstrap samples), and then use the bootstrap samples to estimate your statistic of interest. The great thing about the bootstrap is that you can use it to estimate a confidence interval of almost any statistic of interest, be it the mean, median, correlation, slope in a mixed effects regression model, etc...

To implement it in R using tidyverse, you can do the following:

# Write a function to get your statistic of interest on a randomly drawn sample
# (i.e. median in your case) with replacement

get_median <- function(x) {

   x_sample <- sample(x, size = length(x), replace = TRUE)
   median(x)

}  

# After that you iterate your function many times (e.g. 1000 times) using purrr

bootstrapped_medians <- purrr::map_dbl(1:1000, ~ get_medians(x = iris$Sepal.Width))

# Now you can use the vector of bootstrapped statistics to get the desired summary
# e.g. 95% confidence interval

quantile(bootstrapped_medians, c(0.025, 0.975))

Upvotes: 3

Allan Cameron
Allan Cameron

Reputation: 173898

You are looking for a one-sample Wilcoxon signed rank test:

wilcox.test(iris$Sepal.Width, mu = 3.8)
#> 
#>  Wilcoxon signed rank test with continuity correction
#> 
#> data:  iris$Sepal.Width
#> V = 113, p-value < 2.2e-16
#> alternative hypothesis: true location is not equal to 3.8

You can add a horizontal line to the boxplot with geom_hline and a text annotation with geom_text

ggplot(iris, aes(x = "", y = Sepal.Width)) +
  geom_boxplot() + 
  geom_hline(aes(yintercept=3.8), linetype = 2) +
  geom_text(aes(label = "True median", x = 0.5, y = 3.9))

enter image description here

Upvotes: 4

Related Questions