Reputation: 379
The question is about two observations related to following 3 figures:
(1) Why the histograms in (a) and (b) are different if number of bins is same?
(2) Histogram in (b) is exactly same as the histogram for the fillnonsmo
. If this is the case then how to make histogram of complete data using ggplot()?
(a) Plot using hist(chol$AGE,30)
.
(b) Histogram plotted with ggplot(data=chol, aes(chol$AGE)) + geom_histogram()
and default values i.e. 30 bins.
(c) Now adding fill with respect to the variable SMOKE
:
ggplot(data=chol, aes(chol$AGE)) +
geom_histogram(aes(fill = chol$SMOKE))
Upvotes: 0
Views: 260
Reputation: 24079
Most likely there are a large number of values matching the bins upper and lower limits so depending on the preferences, of whether it is left-open or right-open there could be a significant shift in bins.
For example compare:
set.seed(10)
age<-as.integer(rnorm(100, 50, 20))
par(mfrow=c(2, 1))
hist(age, 30, right=TRUE)
hist(age, 30, right=FALSE)
Notice, only about 18 bins were created (bin width of 5)
With ggplot2, where the bins are shifted to the center of the bin range:
library(ggplot2)
ggplot(data.frame(age), aes(age)) +geom_histogram()
Upvotes: 0
Reputation: 379
Here is what I did after comments by @Dave2e
ggplot(data=chol, aes(AGE, fill = SMOKE)) +
geom_histogram(aes(y = ..count..), binwidth = 1, position = "stack")
hist(chol$AGE, breaks = 30, right = FALSE)
Adding correct value for binwidth
, realizing by default position
is stack
and using right
as false
got exactly same histograms.
Upvotes: 1