jwimberley
jwimberley

Reputation: 1748

Possible ggplot2 bug: inconsistent normalizations of overlaid histograms

I recently discovered some odd behavior in ggplot2 by accident. The following code

N <- 1000
coin <- rep(c(0,1),N/2)
N1 <- sum(coin)
N0 <- sum(1-coin)
values <- rep(0,N)
values[coin==0] <- rnorm(N0,mean=0,sd=1)
values[coin==1] <- rnorm(N1,mean=0,sd=1)
dat = data.frame('Value'=values,'Category'=as.factor(coin))

creates a dataset that has one numeric column and one factor column, with equal numbers of events belonging to each of the two categories:

> summary(dat)
     Value           Category
 Min.   :-3.901785   0:500   
 1st Qu.:-0.669807   1:500   
 Median : 0.020031           
 Mean   :-0.008229           
 3rd Qu.: 0.650803           
 Max.   : 3.195819   

However, when plotting the Value column broken down by category, category 1 appears with a much greater normalization than category 0:

ggplot(dat,aes(x=Value,fill=Category)) + geom_histogram(alpha=0.5) + theme_bw()

enter image description here

This appears very odd. The bin widths appear equal for the two histograms, as they should, but the total counts of events are not equal, as they should be. The category 0 histogram is in fact the histogram of the entire dataset:

ggplot(dat,aes(x=Value)) + geom_histogram(alpha=0.5) + theme_bw()

enter image description here

Is this a ggplot2 bug, or am I making some mistake I haven't noticed? (I get the same thing if I replace categories 0 and 1 with 'A' and 'B' by the way).

System details:

Upvotes: 3

Views: 52

Answers (1)

James
James

Reputation: 66864

geom_histogram defaults to stacking the bars atop one another via the argument position="stack". This is useful to see the overall composition and the contributions of each part at the same time, but not so useful for comparing the parts directly. You can override this by changing the position argument to "identity", eg:

ggplot(dat,aes(x=Value,fill=Category)) +
 geom_histogram(alpha=0.5, position="identity") + theme_bw()

Histogram using position="identity"

Upvotes: 5

Related Questions