Reputation:
I am making a series of density plots with geom_density
from a dataframe, and showing it by condition using facet_wrap
, as in:
ggplot(iris) + geom_density(aes(x=Sepal.Width, colour=Species, y=..count../sum(..count..))) + facet_wrap(~Species)
When I do this, the y-axis scale seems to not represent percent of each Species
in a panel, but rather the percent of all the total datapoints across all species.
My question is: How can I make it so the ..count..
variable in geom_density
refers to the count of items in each Species
set of each panel, so that the panel for virginica
has a y-axis corresponding to "Fraction of virginica
data points"?
Also, is there a way to get ggplot2 to output the values it uses for ..count..
and sum(..count..)
so that I can verify what numbers it is using?
edit: I misunderstood geom_density
it looks like even for a single Species
, ..count../sum(..count..)
is not a percentage:
ggplot(iris[iris$Species == 'virginica',]) + geom_density(aes(x=Sepal.Width, colour=Species, y=..count../sum(..count..))) + facet_wrap(~Species)
so my revised question: how can I get the density plot to be the fraction of data in each bin? Do I have to use stat_density
for this or geom_histogram
? I just want the y-axis to be percentage / fraction of data points
Upvotes: 5
Views: 8739
Reputation: 61
Passing the argument scales='free_y'
to facet_wrap()
should do the trick.
Upvotes: 1
Reputation: 2902
Unfortunately, what you are asking ggplot2 to do is define separate y's for each facet, which it syntactically cannot do AFAIK.
So, in response to your mentioning in the comment thread that you "just want a histogram fundamentally", I would suggest instead using geom_histogram
or, if you're partial to lines instead of bars, geom_freqpoly
:
ggplot(iris, aes(Sepal.Width, ..count..)) +
geom_histogram(aes(colour=Species, fill=Species), binwidth=.2) +
geom_freqpoly(colour="black", binwidth=.2) +
facet_wrap(~Species)
**Note: geom_freqpoly works just as well in place of geom_histogram in my above example. I just added both in one plot for sake of efficiency.
Hope this helps.
EDIT: Alright, I managed to work out a quick-and-dirty way of getting what you want. It requires that you install and load plyr
. Apologies in advance; this is likely not the most efficient way to do this in terms of RAM usage, but it works.
First, let's get iris out in the open (I use RStudio so I'm used to seeing all my objects in a window):
d <- iris
Now, we can use ddply
to count the number of individuals belonging to each unique measurement of what will become your x-axis (here I used Sepal.Length instead of Sepal.Width, to give myself a bit more range, simply for seeing a bigger difference between groups when plotted).
new <- ddply(d, c("Species", "Sepal.Length"), summarize, count=length(Sepal.Length))
Note that ddply
automatically sorts the output data.frame according to the quoted variables.
Then we can divvy up the data.frame into each of its unique conditions--in the case of iris, each of the three species (I'm sure there's a much smoother way to go about this, and if you're working with really large amounts of data it's not advisable to keep creating subsets of the same data.frame because you could max out your RAM)...
set <- new[which(new$Species%in%"setosa"),]
ver <- new[which(new$Species%in%"versicolor"),]
vgn <- new[which(new$Species%in%"virginica"),]
... and use ddply
again to calculate proportions of individuals falling under each measurement, but separately for each species.
prop <- rbind(ddply(set, c("Species"), summarize, prop=set$count/sum(set$count)),
ddply(ver, c("Species"), summarize, prop=ver$count/sum(ver$count)),
ddply(vgn, c("Species"), summarize, prop=vgn$count/sum(vgn$count)))
Then we just put everything we need into one dataset and remove all the junk from our workspace.
new$prop <- prop$prop
rm(list=ls()[which(!ls()%in%c("new", "d"))])
And we can make our figure with facet-specific proportions on the y. Note that I'm now using geom_line
since ddply
has automatically ordered your data.frame.
ggplot(new, aes(Sepal.Length, prop)) +
geom_line(aes(colour=new$Species)) +
facet_wrap(~Species)
# let's check our work. each should equal 50
sum(new$count[which(new$Species%in%"setosa")])
sum(new$count[which(new$Species%in%"versicolor")])
sum(new$count[which(new$Species%in%"versicolor")])
#... and each of these should equal 1
sum(new$prop[which(new$Species%in%"setosa")])
sum(new$prop[which(new$Species%in%"versicolor")])
sum(new$prop[which(new$Species%in%"versicolor")])
Upvotes: 6
Reputation: 7714
Maybe using table() and barplot() you might be able to get what you need. I'm still not sure if this is what you are after...
barplot(table(iris[iris$Species == 'virginica',1]))
With ggplot2
tb <- table(iris[iris$Species == 'virginica',1])
tb <- as.data.frame(tb)
ggplot(tb, aes(x=Var1, y=Freq)) + geom_bar()
Upvotes: 0