Reputation: 351
I have a dataset for the 2002 NYC Marathon and the places of each person. I also have the gender for each person.
When I plot a histogram, grouping by gender, the counts for female are off!
When I plot a FreqPoly plot, the distribution is as expected based on the data.
Can anyone explain this discrepency? The red bars are for females and the blue bar is for males. The same colors apply to the freq_poly graph.
The red line is where the female racers' counts should be, but the histogram shows them at much higher values. Why?
Upvotes: 2
Views: 922
Reputation: 12699
Not an answer but a visualisation of the different position options as discussed in Ian Campbell's and teunbrand's answers
library(ggplot2)
set.seed(1)
p1 <- ggplot()+
geom_histogram(data = data.frame(x = rnorm(100), g = rep(1:2, 50)), aes(x, fill = factor(g)), position = "dodge")+
ggtitle("position = dodge")
set.seed(1)
p2 <- ggplot()+
geom_histogram(data = data.frame(x = rnorm(100), g = rep(1:2, 50)), aes(x, fill = factor(g)), position = "identity")+
ggtitle("position = identity")
set.seed(1)
p3 <- ggplot()+
geom_histogram(data = data.frame(x = rnorm(100), g = rep(1:2, 50)), aes(x, fill = factor(g)))+
ggtitle("position = stack")
library(patchwork)
p1/p2/p3
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Created on 2020-07-11 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 24790
To elaborate what teunbrand states in the comment, the problem is that your histogram bars are being stacked on top of each other. This is because the default position argument for geom_histogram
is position = "stack"
. This is in contradistinction to geom_freqpoly
where the default is position = "identity"
.
Thus, all you need to do is add position = "identity"
:
data(nym.2002, package = "UsingR")
ggplot(nym.2002, aes(x = place)) +
geom_freqpoly(aes(color = gender)) +
geom_histogram(aes(fill = gender),
alpha = 0.2,
position = "identity")
If you check out help(geom_freqpoly)
, you can find the default arguments for yourself.
Upvotes: 1