Pat W.
Pat W.

Reputation: 1831

"Density" curve overlay on histogram where vertical axis is frequency (aka count) or relative frequency?

Is there a method to overlay something analogous to a density curve when the vertical axis is frequency or relative frequency? (Not an actual density function, since the area need not integrate to 1.) The following question is similar: ggplot2: histogram with normal curve, and the user self-answers with the idea to scale ..count.. inside of geom_density(). However this seems unusual.

The following code produces an overinflated "density" line.

df1            <- data.frame(v = rnorm(164, mean = 9, sd = 1.5))
b1             <- seq(4.5, 12, by = 0.1)
hist.1a        <- ggplot(df1, aes(v)) + 
                    stat_bin(aes(y = ..count..), color = "black", fill = "blue",
                             breaks = b1) + 
                    geom_density(aes(y = ..count..))
hist.1a

plot

Upvotes: 16

Views: 17079

Answers (3)

St&#233;phane Laurent
St&#233;phane Laurent

Reputation: 84529

library(ggplot2)
smoothedHistogram <- function(dat, y, bins=30, xlabel = y, ...){
  gg <- ggplot(dat, aes_string(y)) + 
    geom_histogram(bins=bins, center = 0.5, stat="bin", 
                   fill = I("midnightblue"), color = "#E07102", alpha=0.8) 
  gg_build <- ggplot_build(gg)
  area <- sum(with(gg_build[["data"]][[1]], y*(xmax - xmin)))
  gg <- gg + 
    stat_density(aes(y=..density..*area), 
                 color="#BCBD22", size=2, geom="line", ...)
  gg$layers <- gg$layers[2:1]
  gg + xlab(xlabel) +  
    theme_bw() + theme(axis.title = element_text(size = 16),
                       axis.text = element_text(size = 12))
}

dat <- data.frame(x = rnorm(10000))
smoothedHistogram(dat, "x")

enter image description here

Upvotes: 2

Pat W.
Pat W.

Reputation: 1831

@joran's response/comment got me thinking about what the appropriate scaling factor would be. For posterity's sake, here's the result.

When Vertical Axis is Frequency (aka Count)

density

Thus, the scaling factor for a vertical axis measured in bin counts is

bincount

In this case, with N = 164 and the bin width as 0.1, the aesthetic for y in the smoothed line should be:

y = ..density..*(164 * 0.1)

Thus the following code produces a "density" line scaled for a histogram measured in frequency (aka count).

df1            <- data.frame(v = rnorm(164, mean = 9, sd = 1.5))
b1             <- seq(4.5, 12, by = 0.1)
hist.1a        <- ggplot(df1, aes(x = v)) + 
                    geom_histogram(aes(y = ..count..), breaks = b1, 
                                   fill = "blue", color = "black") + 
                    geom_density(aes(y = ..density..*(164*0.1)))
hist.1a

plot

When Vertical Axis is Relative Frequency

relfreq

Using the above, we could write

hist.1b        <- ggplot(df1, aes(x = v)) + 
                    geom_histogram(aes(y = ..count../164), breaks = b1, 
                                   fill = "blue", color = "black") + 
                    geom_density(aes(y = ..density..*(0.1)))
hist.1b

relf

When Vertical Axis is Density

hist.1c        <- ggplot(df1, aes(x = v)) + 
                    geom_histogram(aes(y = ..density..), breaks = b1, 
                                   fill = "blue", color = "black") + 
                    geom_density(aes(y = ..density..))
hist.1c

dens

Upvotes: 32

joran
joran

Reputation: 173547

Try this instead:

ggplot(df1,aes(x = v)) + 
   geom_histogram(aes(y = ..ncount..)) + 
   geom_density(aes(y = ..scaled..))

Upvotes: 7

Related Questions