Why histogram from `ggplot()` is same as for only for one variable used for `aes` `fill`?

Question

The question is about two observations related to following 3 figures:

(1) Why the histograms in (a) and (b) are different if number of bins is same?
(2) Histogram in (b) is exactly same as the histogram for the fill nonsmo. If this is the case then how to make histogram of complete data using ggplot()?

(a) Plot using hist(chol$AGE,30).

(b) Histogram plotted with ggplot(data=chol, aes(chol$AGE)) + geom_histogram() and default values i.e. 30 bins.

(c) Now adding fill with respect to the variable SMOKE:

ggplot(data=chol, aes(chol$AGE)) + 
  geom_histogram(aes(fill = chol$SMOKE))

Dave2e · Accepted Answer

Most likely there are a large number of values matching the bins upper and lower limits so depending on the preferences, of whether it is left-open or right-open there could be a significant shift in bins.

For example compare:

set.seed(10)
age<-as.integer(rnorm(100, 50, 20))
par(mfrow=c(2, 1))
hist(age, 30, right=TRUE)
hist(age, 30, right=FALSE)

Notice, only about 18 bins were created (bin width of 5)

With ggplot2, where the bins are shifted to the center of the bin range:

library(ggplot2)
ggplot(data.frame(age), aes(age)) +geom_histogram()

Why histogram from `ggplot()` is same as for only for one variable used for `aes` `fill`?

Answers (2)

Related Questions