Reputation: 646
I'm trying to plot two histograms side-by-side, showing the density of each observation value under each condition.
For example, if I have the following data frame:
> (test <- data.frame(rain=c(T,T,T,F,F), bikes=as.integer(c(1,1,2,1,2)), location=as.factor(c('a','b','a','b','b'))))
rain bikes location
1 TRUE 1 a
2 TRUE 1 b
3 TRUE 2 a
4 FALSE 1 b
5 FALSE 2 b
Then I want to draw a histogram for rain=FALSE with two bars of height 0.5; and another for rain=TRUE with bars of height 1/3 and 2/3.
I've tried this
ggplot(test, aes(x=bikes, y=..density..)) +
geom_bar() +
scale_x_discrete() +
facet_wrap(~rain) +
scale_y_continuous(breaks=seq(0, 1, 0.05))
and it gives the correct shape, but every bar is about 10% too tall:
I've also tried y=..count../sum(..count..)
, but there the bar heights are 0.2, 0.2, 0.4, 0.2 - it seems to be summing over the whole data frame, not just the rain
condition.
(I don't really get the ..foo..
syntax. I've seen this answer, but I still don't get where density
and count
come from.)
I know I could create a temporary data frame to plot instead, but I prefer to avoid that - doing everything from the same data frame feels more flexible for things I might want to do in future - and I haven't come up with a non-awful way of doing it.
Ideally, I'd also like to color the bars by location
. If I do that with ..density..
, I get this result:
where the statistic's apparently been calculated over each of four conditions (rain-a, rain-b, dry-a, dry-b). I want it only calculated over the rain/dry condition.
Upvotes: 0
Views: 131
Reputation: 615
Heyo, it's much easier to modify your data frame in order to get R to do what you want. Using the plyr
package:
❥ library(plyr)
❥ test2 <- ddply(test, .(rain), transform, proportion = 1/length(rain))
❥ test2
rain bikes location proportion
1 FALSE 1 b 0.5000
2 FALSE 2 b 0.5000
3 TRUE 1 a 0.3333
4 TRUE 1 b 0.3333
5 TRUE 2 a 0.3333
❥ ggplot(test2, aes(x=bikes)) + geom_bar(aes(y = proportion), stat= "identity") + facet_grid(~rain) + scale_y_continuous(labels=percent) + scale_x_continuous(breaks = 1:max(test2$bikes))
Upvotes: 1