Reputation: 185
I got the following code and its working fine. Except that I can't manage to address the correct mean and sd in the stat_function() of the relevant factor variable to draw the appropiate normal distribution curve over the histogram.
p <- ggplot(data = df, aes(x=DELY_QTY)) +
geom_histogram(aes(x=DELY_QTY, y=..density..), color="#76C0C1", fill="#76C0C1", bins=30)+
stat_function(fun=dnorm, args = list(mean=mean(df$DELY_QTY), sd=sd(df$DELY_QTY)), color="#C10534", size=2, alpha=0.75)+
stat_density(geom = "line", color="#1A476F", size=2, alpha=0.75)+
facet_wrap(~PIA_ITEM, scales = "free")
The internal structure of the data frame looks like this:
'data.frame': 66333 obs. of 2 variables:
$ PIA_ITEM: Factor w/ 7 levels "GH26 2.6t Typ 1172-89",..: 2 2 2 2 2 2 2 2 2 2 ...
$ DELY_QTY: int 43 37 41 73 34 53 47 51 43 34 ...
How can I address the
list(mean=mean(df$DELY_QTY), sd=sd(df$DELY_QTY))
properly ?
structure(list(PIA_ITEM = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("GH26 2.6t Typ 1172-89",
"GH26 11,6t Typ 3611", "GH26 13,6t Typ 3621", "GH26 5,9t Typ 3613",
"GH26 29,0t Typ 3615", "GH26 24,0t Typ 3625", "GH26 5,2t Typ 3630"
), class = "factor"), DELY_QTY = c(43L, 37L, 41L, 73L, 34L, 53L,
47L, 51L, 43L, 34L, 30L, 44L, 51L, 84L, 16L, 24L, 12L, 11L, 20L,
20L)), row.names = c(NA, 20L), class = "data.frame")
Upvotes: 1
Views: 628
Reputation: 37933
I had written a function at some point to adress these types of issues. I've put it in the package ggh4x. Here is a (slightly simplified) example:
library(ggplot2)
library(ggh4x)
ggplot(data = df, aes(x = DELY_QTY)) +
geom_histogram(aes(y = after_stat(density)),
alpha = 0.5, bins = 30) +
stat_density(geom = "line") +
stat_theodensity(colour = "red") +
facet_wrap(~ PIA_ITEM, scales = "free")
Upvotes: 2
Reputation: 173803
If you want to do this in ggplot, you can't use stat_function
, because it will put the some curve in each facet. You can fairly easily create the curves yourself in a small supplementary data frame. First I have made some sample data to try to make this more representative of your real data:
set.seed(69)
df <- data.frame(DELY_QTY = do.call("c", lapply(1:7, function(x)
round(rnorm(100, x * 7 + 30, 10)))),
PIA_ITEM = LETTERS[1:7])
Now we can create the normal distribution curves:
df2 <- do.call("rbind", lapply(split(df, df$PIA_ITEM), function(x) {
s <- seq(min(x$DELY_QTY), max(x$DELY_QTY), length.out = 100)
data.frame(DELY_QTY = s,
y = dnorm(s, mean(x$DELY_QTY), sd(x$DELY_QTY)),
PIA_ITEM = x$PIA_ITEM[1])
}))
Then for the plot we only need to add a single geom_line
in place of the stat_function
:
ggplot(data = df, aes(x=DELY_QTY)) +
geom_histogram(aes(x = DELY_QTY, y = ..density..), color = "#76C0C1",
fill = "#76C0C1", bins = 30) +
geom_line(data = df2, aes(y = y), color = "#C10534", size = 2, alpha = 0.75) +
stat_density(geom = "line", color = "#1A476F", size = 2, alpha = 0.75) +
facet_wrap(~PIA_ITEM, scales = "free")
So your actual plot would look something like this:
Upvotes: 1