Reputation:
I created the plot below using:
ggplot(data_all, aes(x = data_all$Speed, fill = data_all$Season)) +
theme_bw() +
geom_histogram(position = "identity", alpha = 0.2, binwidth=0.1)
As you can see, the difference in the amount of data available is very large. Is there a way to look only at the distribution and not at the total data amount?
Upvotes: 1
Views: 927
Reputation: 13793
You can reference some of the other calculated values from stat functions using a notation that you may have seen before: ..value..
. I'm not sure the proper name for these or where you can find a list documented, but sometimes these are called "special variables" or "calculated aesthetics".
In this case, the default calculated aesthetic on the y axis for geom_histogram()
is ..count..
. When comparing distributions of different total N size, it's useful to use ..density..
. You can access ..density..
by passing it to the y
aesthetic directly in the geom_histogram()
function.
First, here's an example of two histograms with vastly different sizes (similar to OP's question):
library(ggplot2)
set.seed(8675309)
df <- data.frame(
x = c(rnorm(1000, -1, 0.5), rnorm(100000, 3, 1)),
group = c(rep("A", 1000), rep("B", 100000))
)
ggplot(df, aes(x, fill=group)) + theme_classic() +
geom_histogram(
alpha=0.2, color='gray80',
position="identity", bins=80)
And here's the same plot using ..density..
:
ggplot(df, aes(x, fill=group)) + theme_classic() +
geom_histogram(
aes(y=..density..), alpha=0.2, color='gray80',
position="identity", bins=80)
Upvotes: 3