J. Dowee
J. Dowee

Reputation: 379

Why histogram from `ggplot()` is same as for only for one variable used for `aes` `fill`?

The question is about two observations related to following 3 figures:

(1) Why the histograms in (a) and (b) are different if number of bins is same?
(2) Histogram in (b) is exactly same as the histogram for the fill nonsmo. If this is the case then how to make histogram of complete data using ggplot()?

(a) Plot using hist(chol$AGE,30).

Histogram using hist()

(b) Histogram plotted with ggplot(data=chol, aes(chol$AGE)) + geom_histogram() and default values i.e. 30 bins.

Histogram with ggplot()

(c) Now adding fill with respect to the variable SMOKE:

ggplot(data=chol, aes(chol$AGE)) + 
  geom_histogram(aes(fill = chol$SMOKE))

Histogram using ggplot() with fill.

Upvotes: 0

Views: 260

Answers (2)

Dave2e
Dave2e

Reputation: 24079

Most likely there are a large number of values matching the bins upper and lower limits so depending on the preferences, of whether it is left-open or right-open there could be a significant shift in bins.

For example compare:

set.seed(10)
age<-as.integer(rnorm(100, 50, 20))
par(mfrow=c(2, 1))
hist(age, 30, right=TRUE)
hist(age, 30, right=FALSE)

enter image description here Notice, only about 18 bins were created (bin width of 5)

With ggplot2, where the bins are shifted to the center of the bin range:

library(ggplot2)
ggplot(data.frame(age), aes(age)) +geom_histogram()

enter image description here

Upvotes: 0

J. Dowee
J. Dowee

Reputation: 379

Here is what I did after comments by @Dave2e

ggplot(data=chol, aes(AGE, fill = SMOKE)) + 
  geom_histogram(aes(y = ..count..), binwidth = 1, position = "stack")

hist(chol$AGE, breaks = 30, right = FALSE)

enter image description here

enter image description here

Adding correct value for binwidth, realizing by default position is stack and using right as false got exactly same histograms.

Upvotes: 1

Related Questions