Wincow
Wincow

Reputation: 79

ggplot2 histogram has solid line along x axis for which there are no values

I'm using ggplot2 to plot the frequency of distance measurements of various roads. Y axis is frequency, x axis is distance. I notice in all the plots (Not just this one) that there is a solid line along the 0 frequency value for all distances - see graph here:

enter image description here

For example, in the image I provided, the maximum road distance is 25, but the line stretches to 30. No matter what I set the xlim to, the line stretches to that maximum distance. I'm not sure what in the code is causing this. Below is the code I'm using to get this:

ggplot(ln_jan, aes(x=kilo, color=zone_sm)) +
  geom_histogram(fill="black", alpha=.8, position="identity", size =1.15)+
  xlim(0, 30)+
  ylim(0, 4000)+
  ggtitle("Road lengths")+
  ylab("Frequency")+ 
  xlab("Distance (km)")+
  theme(plot.title = element_text(hjust = 0.5, size = 21, face = "bold"))+
  scale_color_discrete(name = "road types", 
  labels=c("highways", "small roads"))+
  theme(axis.text=element_text(size=10, face = "bold"),
        axis.title=element_text(size=14,face="bold"))+
  theme(panel.background = element_rect(fill = 'gray70'))+
  theme(plot.title = element_text(size=26))

Here is the head of the dataset for reproducing the problem:

ID     kilo       zone_sm
185   12.522931      NW
234   12.702159      NW
25315  1.939652      NE
25411  1.938117      NE
25507  1.936778      NE
25603  1.935634      NE

As requested here is the dput(hist(ln_jan$kilo)):

structure(list(breaks = c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26), counts = c(6079L, 8898L, 4240L, 2414L, 1677L, 986L, 760L, 609L, 394L, 639L, 338L, 53L, 14L), density = c(0.112154533043061, 0.1641636839969, 0.078225895723405, 0.0445371019519575, 0.0309398177189034, 0.0181912106564333, 0.0140216228183462, 0.0112357477583853, 0.00726910446109, 0.0117892328696358, 0.00623593225342238, 0.000977823696542563, 0.000258293051916903), mids = c(1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25), xname = "ln_jan$kilo", equidist = TRUE), class = "histogram")

Here is the length(which(ln_jan$kilo>25)) result: 1. The value of this one is 25.01803... I rounded a little

Upvotes: 2

Views: 1088

Answers (1)

phalteman
phalteman

Reputation: 3542

The issue that you're seeing is that specifying xlim() means that ggplot has to show the frequency of observations in the bins up to 30, which means you get bins with 0 frequency up to that limit - the thickness of your line is making it way more obvious than it would otherwise be. You can use coord_cartesian() instead of xlim() to show the x axis you want, and leave the styling of your graph the same. Here is an example using a recreated data set:

set.seed(1)
df <- data.frame(x=exp(rnorm(100)))

p <- ggplot(df, aes(x)) +
  geom_histogram(fill="transparent", colour="black", size=2)

p + xlim(0,15)
p + coord_cartesian(xlim=c(0,15)) #<-- this figure shown

enter image description here

Updated code for your plot would look like:

ggplot(ln_jan, aes(x=kilo, color=zone_sm)) +
  geom_histogram(fill="black", alpha=.8, position="identity", size =1.15)+
  coord_cartesian(xlim=c(0, 30), ylim=c(0,4000)) +
  ggtitle("Road lengths")+
  ylab("Frequency")+ 
  xlab("Distance (km)")+
  theme(plot.title = element_text(hjust = 0.5, size = 21, face = "bold"))+
  scale_color_discrete(name = "road types", 
  labels=c("highways", "small roads"))+
  theme(axis.text=element_text(size=10, face = "bold"),
        axis.title=element_text(size=14,face="bold"))+
  theme(panel.background = element_rect(fill = 'gray70'))+
  theme(plot.title = element_text(size=26))

Upvotes: 1

Related Questions