Gilad BA
Gilad BA

Reputation: 5

ggplot2: overlaying stat_function() with geom_col()

When executing the following piece of code, the output plot shows a blue line of f(x) = 0, instead of the Gamma pdf (see the blue line in this picture).

analyzeGamma <- function(csvPath, alpha, beta) {
  dfSamples <- read.csv(file = csvPath,
                        header = TRUE,
                        sep = ",")
  base <- ggplot(dfSamples, aes(x = value, y = quantity))
  base + 
    geom_col(color = "red") +
    geom_vline(xintercept = qgamma(seq(0.1, 0.9, by = 0.1), alpha, beta)) +
    stat_function(
      fun = dgamma,
      args = list(shape = alpha, rate = beta),
      colour = "blue"
    )
}

path = "/tmp/data.csv"
alpha = 1.2
beta = 0.01

analyzeGamma(path, alpha, beta)

When I comment out the line:

geom_col(color = "red") +

The Gamma pdf is drawn correctly, as can be seen here.

Any idea why it happens and how to resolve?

Thanks.

Upvotes: 0

Views: 474

Answers (2)

Wolfgang Arnold
Wolfgang Arnold

Reputation: 1252

ggplot scales the y-axis to show all data. The blue curve appears as a straight line due do scale - if you compare the scale of the y-axis in both charts you'll see: when you draw the geom_col the y axis maximum is somewhere at 25 (and stat_functions seems to be a straigh line). Without the geom_col, y-axis max is somewhere at 0.006.

Upvotes: 0

teunbrand
teunbrand

Reputation: 37903

It's because your geom_col() goes up to 25 and probability density functions have an integral of 1. If I'm correct in assuming your columns resemble a histogram with count data as quantities, you would have to scale your density to match the columns as follows:

density * number of samples * width of columns

If you've precomputed the columns, 'number of samples' would be the sum of all your y-values.

An example with some toy data, notice the function in the stat:

alpha = 1.2
beta = 0.01

df <- data.frame(x = rgamma(1000, shape = alpha, rate = beta))

binwidth <- 5

ggplot(df, aes(x)) +
  geom_histogram(binwidth = binwidth) +
  stat_function(
    fun = function(z, shape, rate)(dgamma(z, shape, rate) * length(df$x) * binwidth),
    args = list(shape = alpha, rate = beta),
    colour = "blue"
  )

enter image description here

The following example with geom_col() gives the same picture:

x <- table(cut_width(df$x, binwidth, boundary = 0))
newdf <- data.frame(x = seq(0.5*binwidth, max(df$x), by = binwidth),
                    y = as.numeric(x))

ggplot(newdf, aes(x, y)) +
  geom_col(width = binwidth) +
  stat_function(
    fun = function(z, shape, rate)(dgamma(z, shape, rate) * sum(newdf$y) * binwidth),
    args = list(shape = alpha, rate = beta),
    colour = "blue"
  )

Upvotes: 1

Related Questions