Possible ggplot2 bug: inconsistent normalizations of overlaid histograms

Question

I recently discovered some odd behavior in ggplot2 by accident. The following code

N <- 1000
coin <- rep(c(0,1),N/2)
N1 <- sum(coin)
N0 <- sum(1-coin)
values <- rep(0,N)
values[coin==0] <- rnorm(N0,mean=0,sd=1)
values[coin==1] <- rnorm(N1,mean=0,sd=1)
dat = data.frame('Value'=values,'Category'=as.factor(coin))

creates a dataset that has one numeric column and one factor column, with equal numbers of events belonging to each of the two categories:

> summary(dat)
     Value           Category
 Min.   :-3.901785   0:500   
 1st Qu.:-0.669807   1:500   
 Median : 0.020031           
 Mean   :-0.008229           
 3rd Qu.: 0.650803           
 Max.   : 3.195819

However, when plotting the Value column broken down by category, category 1 appears with a much greater normalization than category 0:

ggplot(dat,aes(x=Value,fill=Category)) + geom_histogram(alpha=0.5) + theme_bw()

This appears very odd. The bin widths appear equal for the two histograms, as they should, but the total counts of events are not equal, as they should be. The category 0 histogram is in fact the histogram of the entire dataset:

ggplot(dat,aes(x=Value)) + geom_histogram(alpha=0.5) + theme_bw()

Is this a ggplot2 bug, or am I making some mistake I haven't noticed? (I get the same thing if I replace categories 0 and 1 with 'A' and 'B' by the way).

System details:

Mac OS X High Sierra
R version 3.4.0 (2017-04-21)
ggplot2_2.2.1

James · Accepted Answer

geom_histogram defaults to stacking the bars atop one another via the argument position="stack". This is useful to see the overall composition and the contributions of each part at the same time, but not so useful for comparing the parts directly. You can override this by changing the position argument to "identity", eg:

ggplot(dat,aes(x=Value,fill=Category)) +
 geom_histogram(alpha=0.5, position="identity") + theme_bw()

Possible ggplot2 bug: inconsistent normalizations of overlaid histograms

Answers (1)

Related Questions