Reputation: 31938
I have the following script, which makes my normal curve too small:
ggplot(exercise2d_df, aes(x=residuals_list)) +
geom_histogram(alpha=0.2, position="identity") +
stat_function(fun = dnorm, args = c(mean=mean(residuals_list), sd=sd(residuals_list)), size = 1, color = "red")
My data is:
residuals_list = c(0.183377698905335, 7.18337769890574, 1.18337769890566, 4.18337769890565, 5.18337769890565, 0.183377698905655, 3.18337769890566,-0.816622301094345, -2.81662230109434, 3.18337769890566, 8.18337769890566, 2.18337769890566, 4.18337769890565, 0.183377698905655, 5.18337769890565, -10.0541259982254, -9.05412599822537, -8.05412599822537, -5.05412599822537, -4.05412599822537, -3.05412599822537, -10.0541259982254, -6.05412599822537, -8.05412599822537, -7.05412599822537, -6.05412599822537, -7.05412599822537, -7.05412599822537, -5.05412599822537, -4.05412599822537, -3.05412599822537, -11.0541259982254, -9.05412599822537, -3.05412599822537, -1.05412599822537, -7.2916296953564, -8.2916296953564, -2.2916296953564, 0.708370304643597, -5.2916296953564, -3.2916296953564, -6.2916296953564, -2.2916296953564, 1.7083703046436, -5.2916296953564, -9.2916296953564, -5.2916296953564, -4.2916296953564, -4.2916296953564, -0.291629695356403, 1.18337769890566, -4.81662230109435, 0.183377698905655, 0.183377698905655, 0.183377698905655, 5.18337769890565, -0.816622301094345, -4.81662230109435, -3.81662230109434, -1.81662230109434, -0.816622301094345, 2.18337769890566, 3.18337769890566, 6.18337769890565, 8.18337769890566, 2.94587400177463, -3.05412599822537, 3.94587400177463, 4.94587400177463, 6.94587400177463, -0.0541259982253741, -0.0541259982253741, -0.0541259982253741, 0.945874001774626, 0.945874001774626, 0.945874001774626, 0.945874001774626, 3.94587400177463, 2.94587400177463, 0.945874001774626, 1.94587400177463, -3.05412599822537, 5.7083703046436, 4.7083703046436, 1.7083703046436, 11.7083703046436, 6.7083703046436, 7.7083703046436, 2.7083703046436, 3.7083703046436, 9.7083703046436, 8.7083703046436, 6.7083703046436, 6.7083703046436, -0.291629695356403, 5.7083703046436, 4.7083703046436, -1.2916296953564, 9.7083703046436, 8.7083703046436, 1.7083703046436, 2.7083703046436, 3.7083703046436)
This code creates a graph like the following:
How do I stretch the normal curve so that it fits the histogram?
(Notice that this is not a question about how to superimpose a normal curve to a histogram in ggplot, even though that is what I am ultimately after, so this is not a duplicate.)
Upvotes: 3
Views: 667
Reputation: 49640
The current area under the normal curve is 1, the area of the histogram is the width of the bars times the number of points. So if you multiply the height of the normal curve by this value then it will have the same area. The following works (using the default binwidth calculation, it may be better/more direct to specify a binwidth):
tmpfun <- function(x,mean,sd) {
diff(range(residuals_list))/30*length(residuals_list)*dnorm(x,mean,sd)
}
ggplot(exercise2d_df, aes(x=residuals_list)) +
geom_histogram(alpha=0.2, position="identity") +
stat_function(fun = tmpfun, args = c(mean=mean(residuals_list),
sd=sd(residuals_list)), size = 1, color = "red")
Upvotes: 2