user2750268
user2750268

Reputation: 23

ggplot2 histogram of factors showing the probability mass instead of count

I am trying to use the excellent ggplot2 using the bar geom to plot the probability mass rather than the count. However, using aes(y=..density..) the distribution does not sum to one (but is close). I think the problem might be due to the default binwidth for factors. Here is an example of the problem,

age <- c(rep(0,4), rep(1,4))
mppf <- c(1,1,1,0,1,1,0,0)
data.test <- as.data.frame(cbind(age,mppf))
data.test$age <- as.factor(data.test$age)
data.test$mppf <- as.factor(data.test$mppf)
p.test.density <- ggplot(data.test, aes(mppf, group=age, fill=age)) +
geom_bar(aes(y=..density..), position='dodge') +
scale_y_continuous(limits=c(0,1))
dev.new()
print(p.test.density)

I can get around this problem by keeping the x-variable as continuous and setting binwidth=1, but it doesn't seem very elegant.

data.test$mppf.numeric <- as.numeric(data.test$mppf)
p.test.density.numeric <- ggplot(data.test, aes(mppf.numeric, group=age, fill=age)) + 
geom_histogram(aes(y=..density..), position='dodge', binwidth=1)+ 
scale_y_continuous(limits=c(0,1))
dev.new()
print(p.test.density.numeric)

Upvotes: 2

Views: 7373

Answers (1)

aosmith
aosmith

Reputation: 36076

I think you almost have it figured out, and would have once you realized you needed a bar plot and not a histogram.

The default width for bars with categorical data is .9 (See ?stat_bin. The help page for geom_bar doesn't give the default bar width but does send you to stat_bin for further reading.). Given that, your plots show the correct density for a bar width of .9. Simply change to a width of 1 and you will see the density values you expected to see.

ggplot(data.test, aes(x = mppf, group = age, fill = age)) +
  geom_bar(aes(y=..density..), position = "dodge", width = 1) +
  scale_y_continuous(limits=c(0,1))

Upvotes: 3

Related Questions