Reputation: 5
When executing the following piece of code, the output plot shows a blue line of f(x) = 0
, instead of the Gamma pdf (see the blue line in this picture).
analyzeGamma <- function(csvPath, alpha, beta) {
dfSamples <- read.csv(file = csvPath,
header = TRUE,
sep = ",")
base <- ggplot(dfSamples, aes(x = value, y = quantity))
base +
geom_col(color = "red") +
geom_vline(xintercept = qgamma(seq(0.1, 0.9, by = 0.1), alpha, beta)) +
stat_function(
fun = dgamma,
args = list(shape = alpha, rate = beta),
colour = "blue"
)
}
path = "/tmp/data.csv"
alpha = 1.2
beta = 0.01
analyzeGamma(path, alpha, beta)
When I comment out the line:
geom_col(color = "red") +
The Gamma pdf is drawn correctly, as can be seen here.
Any idea why it happens and how to resolve?
Thanks.
Upvotes: 0
Views: 474
Reputation: 1252
ggplot scales the y-axis to show all data. The blue curve appears as a straight line due do scale - if you compare the scale of the y-axis in both charts you'll see: when you draw the geom_col
the y axis maximum is somewhere at 25 (and stat_functions seems to be a straigh line). Without the geom_col, y-axis max is somewhere at 0.006.
Upvotes: 0
Reputation: 37903
It's because your geom_col()
goes up to 25 and probability density functions have an integral of 1. If I'm correct in assuming your columns resemble a histogram with count data as quantities
, you would have to scale your density to match the columns as follows:
density * number of samples * width of columns
If you've precomputed the columns, 'number of samples' would be the sum of all your y-values.
An example with some toy data, notice the function in the stat:
alpha = 1.2
beta = 0.01
df <- data.frame(x = rgamma(1000, shape = alpha, rate = beta))
binwidth <- 5
ggplot(df, aes(x)) +
geom_histogram(binwidth = binwidth) +
stat_function(
fun = function(z, shape, rate)(dgamma(z, shape, rate) * length(df$x) * binwidth),
args = list(shape = alpha, rate = beta),
colour = "blue"
)
The following example with geom_col()
gives the same picture:
x <- table(cut_width(df$x, binwidth, boundary = 0))
newdf <- data.frame(x = seq(0.5*binwidth, max(df$x), by = binwidth),
y = as.numeric(x))
ggplot(newdf, aes(x, y)) +
geom_col(width = binwidth) +
stat_function(
fun = function(z, shape, rate)(dgamma(z, shape, rate) * sum(newdf$y) * binwidth),
args = list(shape = alpha, rate = beta),
colour = "blue"
)
Upvotes: 1