ggplot2: overlaying stat_function() with geom_col()

Question

When executing the following piece of code, the output plot shows a blue line of f(x) = 0, instead of the Gamma pdf (see the blue line in this picture).

analyzeGamma <- function(csvPath, alpha, beta) {
  dfSamples <- read.csv(file = csvPath,
                        header = TRUE,
                        sep = ",")
  base <- ggplot(dfSamples, aes(x = value, y = quantity))
  base + 
    geom_col(color = "red") +
    geom_vline(xintercept = qgamma(seq(0.1, 0.9, by = 0.1), alpha, beta)) +
    stat_function(
      fun = dgamma,
      args = list(shape = alpha, rate = beta),
      colour = "blue"
    )
}

path = "/tmp/data.csv"
alpha = 1.2
beta = 0.01

analyzeGamma(path, alpha, beta)

When I comment out the line:

geom_col(color = "red") +

The Gamma pdf is drawn correctly, as can be seen here.

Any idea why it happens and how to resolve?

Thanks.

teunbrand · Accepted Answer

It's because your geom_col() goes up to 25 and probability density functions have an integral of 1. If I'm correct in assuming your columns resemble a histogram with count data as quantities, you would have to scale your density to match the columns as follows:

density * number of samples * width of columns

If you've precomputed the columns, 'number of samples' would be the sum of all your y-values.

An example with some toy data, notice the function in the stat:

alpha = 1.2
beta = 0.01

df <- data.frame(x = rgamma(1000, shape = alpha, rate = beta))

binwidth <- 5

ggplot(df, aes(x)) +
  geom_histogram(binwidth = binwidth) +
  stat_function(
    fun = function(z, shape, rate)(dgamma(z, shape, rate) * length(df$x) * binwidth),
    args = list(shape = alpha, rate = beta),
    colour = "blue"
  )

The following example with geom_col() gives the same picture:

x <- table(cut_width(df$x, binwidth, boundary = 0))
newdf <- data.frame(x = seq(0.5*binwidth, max(df$x), by = binwidth),
                    y = as.numeric(x))

ggplot(newdf, aes(x, y)) +
  geom_col(width = binwidth) +
  stat_function(
    fun = function(z, shape, rate)(dgamma(z, shape, rate) * sum(newdf$y) * binwidth),
    args = list(shape = alpha, rate = beta),
    colour = "blue"
  )

ggplot2: overlaying stat_function() with geom_col()

Answers (2)

Related Questions