Kevin Lee
Kevin Lee

Reputation: 351

Histogram not showing correct count/values? (Histogram vs Geom Freqpoly)

I have a dataset for the 2002 NYC Marathon and the places of each person. I also have the gender for each person.

When I plot a histogram, grouping by gender, the counts for female are off!

When I plot a FreqPoly plot, the distribution is as expected based on the data.

Can anyone explain this discrepency? The red bars are for females and the blue bar is for males. The same colors apply to the freq_poly graph.

The red line is where the female racers' counts should be, but the histogram shows them at much higher values. Why? enter image description here

Upvotes: 2

Views: 922

Answers (2)

Peter
Peter

Reputation: 12699

Not an answer but a visualisation of the different position options as discussed in Ian Campbell's and teunbrand's answers


library(ggplot2)
set.seed(1)
p1 <- ggplot()+
  geom_histogram(data = data.frame(x = rnorm(100), g = rep(1:2, 50)), aes(x, fill = factor(g)), position = "dodge")+
  ggtitle("position = dodge")

set.seed(1)
p2 <- ggplot()+
  geom_histogram(data = data.frame(x = rnorm(100), g = rep(1:2, 50)), aes(x, fill = factor(g)), position = "identity")+
  ggtitle("position = identity")

set.seed(1)
p3 <- ggplot()+
  geom_histogram(data = data.frame(x = rnorm(100), g = rep(1:2, 50)), aes(x, fill = factor(g)))+
  ggtitle("position = stack")


library(patchwork)

p1/p2/p3
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2020-07-11 by the reprex package (v0.3.0)

Upvotes: 0

Ian Campbell
Ian Campbell

Reputation: 24790

To elaborate what teunbrand states in the comment, the problem is that your histogram bars are being stacked on top of each other. This is because the default position argument for geom_histogram is position = "stack". This is in contradistinction to geom_freqpoly where the default is position = "identity".

Thus, all you need to do is add position = "identity":

data(nym.2002, package = "UsingR")
ggplot(nym.2002, aes(x = place)) + 
  geom_freqpoly(aes(color = gender)) + 
  geom_histogram(aes(fill = gender),
                 alpha = 0.2,
                 position = "identity")

enter image description here

If you check out help(geom_freqpoly), you can find the default arguments for yourself.

Upvotes: 1

Related Questions