Max
Max

Reputation: 185

Multiple normal distributions by factor in ggplot facet_wrap()

I got the following code and its working fine. Except that I can't manage to address the correct mean and sd in the stat_function() of the relevant factor variable to draw the appropiate normal distribution curve over the histogram.

p <- ggplot(data = df, aes(x=DELY_QTY)) + 
  geom_histogram(aes(x=DELY_QTY, y=..density..), color="#76C0C1", fill="#76C0C1", bins=30)+
  stat_function(fun=dnorm, args = list(mean=mean(df$DELY_QTY), sd=sd(df$DELY_QTY)), color="#C10534", size=2, alpha=0.75)+
  stat_density(geom = "line", color="#1A476F", size=2, alpha=0.75)+
  facet_wrap(~PIA_ITEM, scales = "free")

The internal structure of the data frame looks like this:

'data.frame':   66333 obs. of  2 variables:
 $ PIA_ITEM: Factor w/ 7 levels "GH26 2.6t Typ 1172-89",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ DELY_QTY: int  43 37 41 73 34 53 47 51 43 34 ...

How can I address the list(mean=mean(df$DELY_QTY), sd=sd(df$DELY_QTY)) properly ?

structure(list(PIA_ITEM = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("GH26 2.6t Typ 1172-89", 
"GH26 11,6t Typ 3611", "GH26 13,6t Typ 3621", "GH26 5,9t Typ 3613", 
"GH26 29,0t Typ 3615", "GH26 24,0t Typ 3625", "GH26 5,2t Typ 3630"
), class = "factor"), DELY_QTY = c(43L, 37L, 41L, 73L, 34L, 53L, 
47L, 51L, 43L, 34L, 30L, 44L, 51L, 84L, 16L, 24L, 12L, 11L, 20L, 
20L)), row.names = c(NA, 20L), class = "data.frame")

Upvotes: 1

Views: 628

Answers (2)

teunbrand
teunbrand

Reputation: 37933

I had written a function at some point to adress these types of issues. I've put it in the package ggh4x. Here is a (slightly simplified) example:

library(ggplot2)
library(ggh4x)

ggplot(data = df, aes(x = DELY_QTY)) +
  geom_histogram(aes(y = after_stat(density)),
                 alpha = 0.5, bins = 30) +
  stat_density(geom = "line") +
  stat_theodensity(colour = "red") +
  facet_wrap(~ PIA_ITEM, scales = "free")

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 173803

If you want to do this in ggplot, you can't use stat_function, because it will put the some curve in each facet. You can fairly easily create the curves yourself in a small supplementary data frame. First I have made some sample data to try to make this more representative of your real data:

set.seed(69)

df <- data.frame(DELY_QTY = do.call("c", lapply(1:7, function(x) 
                 round(rnorm(100, x * 7 + 30, 10)))),
                 PIA_ITEM = LETTERS[1:7])

Now we can create the normal distribution curves:

df2 <- do.call("rbind", lapply(split(df, df$PIA_ITEM), function(x) {
  s <- seq(min(x$DELY_QTY), max(x$DELY_QTY), length.out = 100)
  data.frame(DELY_QTY = s,
             y = dnorm(s, mean(x$DELY_QTY), sd(x$DELY_QTY)),
             PIA_ITEM = x$PIA_ITEM[1])
}))

Then for the plot we only need to add a single geom_line in place of the stat_function:

ggplot(data = df, aes(x=DELY_QTY)) + 
  geom_histogram(aes(x = DELY_QTY, y = ..density..), color = "#76C0C1", 
                 fill = "#76C0C1", bins = 30) +
  geom_line(data = df2, aes(y = y), color = "#C10534", size = 2, alpha = 0.75) +
  stat_density(geom = "line", color = "#1A476F", size = 2, alpha = 0.75) +
  facet_wrap(~PIA_ITEM, scales = "free")

So your actual plot would look something like this:

enter image description here

Upvotes: 1

Related Questions