philh
philh

Reputation: 646

Faceted density histogram

I'm trying to plot two histograms side-by-side, showing the density of each observation value under each condition.

For example, if I have the following data frame:

> (test <- data.frame(rain=c(T,T,T,F,F), bikes=as.integer(c(1,1,2,1,2)), location=as.factor(c('a','b','a','b','b'))))
   rain bikes location
1  TRUE     1        a
2  TRUE     1        b
3  TRUE     2        a
4 FALSE     1        b
5 FALSE     2        b

Then I want to draw a histogram for rain=FALSE with two bars of height 0.5; and another for rain=TRUE with bars of height 1/3 and 2/3.

I've tried this

ggplot(test, aes(x=bikes, y=..density..)) + 
  geom_bar() + 
  scale_x_discrete() + 
  facet_wrap(~rain) + 
  scale_y_continuous(breaks=seq(0, 1, 0.05))

and it gives the correct shape, but every bar is about 10% too tall:

Bad density picture

I've also tried y=..count../sum(..count..), but there the bar heights are 0.2, 0.2, 0.4, 0.2 - it seems to be summing over the whole data frame, not just the rain condition.

(I don't really get the ..foo.. syntax. I've seen this answer, but I still don't get where density and count come from.)

I know I could create a temporary data frame to plot instead, but I prefer to avoid that - doing everything from the same data frame feels more flexible for things I might want to do in future - and I haven't come up with a non-awful way of doing it.

Ideally, I'd also like to color the bars by location. If I do that with ..density.., I get this result:

enter image description here

where the statistic's apparently been calculated over each of four conditions (rain-a, rain-b, dry-a, dry-b). I want it only calculated over the rain/dry condition.

Upvotes: 0

Views: 131

Answers (1)

jed
jed

Reputation: 615

Heyo, it's much easier to modify your data frame in order to get R to do what you want. Using the plyr package:

❥ library(plyr)
❥ test2 <- ddply(test, .(rain), transform, proportion = 1/length(rain))
❥ test2
   rain bikes location proportion
1 FALSE     1        b     0.5000
2 FALSE     2        b     0.5000
3  TRUE     1        a     0.3333
4  TRUE     1        b     0.3333
5  TRUE     2        a     0.3333
❥ ggplot(test2, aes(x=bikes)) + geom_bar(aes(y = proportion), stat= "identity") + facet_grid(~rain) + scale_y_continuous(labels=percent) + scale_x_continuous(breaks = 1:max(test2$bikes))

corrected plot

Upvotes: 1

Related Questions