Emman
Emman

Reputation: 4201

Shade an area under density curve, to mark the Highest Density Interval (HDI)

I thought this should be straightforward, but I'm lost, despite tons of information online.

My Problem: I have a vector of data points, for which I want to plot a density curve, then color the area under the curve to signify the Highest Density Interval (HDI). Naturally, I'm trying to achieve this with ggplot2 package, and specifically with qplot(), since my data comes as a vector, and not a data frame.

Reproducible Example

library(ggplot2)
library(HDInterval)

## create data vector
set.seed(789)
dat <- rnorm(1000)

## plot density curve with qplot and mark 95% hdi
qplot(dat, geom = "density")+ 
  geom_vline(aes(xintercept = c(hdi(dat))))

So I get this:

density_curve

But what I really want is something like this:

what_i_want

Is there a simple way to achieve this with ggplot2::qplot?

Upvotes: 3

Views: 1367

Answers (2)

Famondir
Famondir

Reputation: 55

When I read this post I was realy grateful for your answer, Wilke. But I wondered how to adjust the credible mass of the hdi. Finally I found a solution! When I figuered out where the quantiles argument comes from (I access it via dots[[2]]) it clicked the trigger. I wrote the following function (because passing the value to HDInterval::hdi didn't work out of the box):

hdi_custWidth <- function(...) {
  dots <- list(...)
  quantiles <- dots[[2]]
  hdi_width <- quantiles[[length(quantiles)]] # uses the last entry if its a vector which should be the biggest one; better pass a single double < 1.0
  if (is.na(hdi_width)) hdi_width <- .89 # happens is quantiles = 1L
  message(paste0('HDI credible interval width = ', hdi_width))
  HDInterval::hdi(dots[[1]], credMass = hdi_width)
}

You can use it to alter the repex from the post above:

library(tidyverse)
library(HDInterval)
library(ggridges)
#> 
#> Attaching package: 'ggridges'
#> The following object is masked from 'package:ggplot2':
#> 
#>     scale_discrete_manual

## create data vector
set.seed(789)
dat <- rnorm(1000)

df <- tibble(dat)

## plot density curve with qplot and mark 95% hdi
ggplot(df, aes(x = dat, y = 0, fill = stat(quantile))) + 
  geom_density_ridges_gradient(quantile_lines = TRUE, quantile_fun = hdi_custWidth, quantiles = .90, vline_linetype = 2) +
  scale_fill_manual(values = c("transparent", "lightblue", "transparent"), guide = "none")
#> Picking joint bandwidth of 0.227

Of course you can pick any value between 0 and 1 in the quantiles argument (not just .90) and get the corresponding hdi credible mass.

Upvotes: 1

Claus Wilke
Claus Wilke

Reputation: 17790

You can do this with the ggridges package. The trick is that we can provide HDInterval::hdi as quantile function to geom_density_ridges_gradient(), and that we can fill by the "quantiles" it generates. The "quantiles" are the numbers in the lower tail, in the middle, and in the upper tail.

As a general point of advice, I would recommend against using qplot(). It's more likely going to cause confusion, and putting a vector into a tibble is not a lot of effort.

library(tidyverse)
library(HDInterval)
library(ggridges)
#> 
#> Attaching package: 'ggridges'
#> The following object is masked from 'package:ggplot2':
#> 
#>     scale_discrete_manual

## create data vector
set.seed(789)
dat <- rnorm(1000)

df <- tibble(dat)

## plot density curve with qplot and mark 95% hdi
ggplot(df, aes(x = dat, y = 0, fill = stat(quantile))) + 
  geom_density_ridges_gradient(quantile_lines = TRUE, quantile_fun = hdi, vline_linetype = 2) +
  scale_fill_manual(values = c("transparent", "lightblue", "transparent"), guide = "none")
#> Picking joint bandwidth of 0.227

Created on 2019-12-24 by the reprex package (v0.3.0)

The colors in scale_fill_manual() are in the order of the three groups, so if you, for example, only wanted to shade the left tail, you would write values = c("lightblue", "transparent", "transparent").

Upvotes: 5

Related Questions