Reputation: 79
I'm using ggplot2 to plot the frequency of distance measurements of various roads. Y axis is frequency, x axis is distance. I notice in all the plots (Not just this one) that there is a solid line along the 0 frequency value for all distances - see graph here:
For example, in the image I provided, the maximum road distance is 25, but the line stretches to 30. No matter what I set the xlim to, the line stretches to that maximum distance. I'm not sure what in the code is causing this. Below is the code I'm using to get this:
ggplot(ln_jan, aes(x=kilo, color=zone_sm)) +
geom_histogram(fill="black", alpha=.8, position="identity", size =1.15)+
xlim(0, 30)+
ylim(0, 4000)+
ggtitle("Road lengths")+
ylab("Frequency")+
xlab("Distance (km)")+
theme(plot.title = element_text(hjust = 0.5, size = 21, face = "bold"))+
scale_color_discrete(name = "road types",
labels=c("highways", "small roads"))+
theme(axis.text=element_text(size=10, face = "bold"),
axis.title=element_text(size=14,face="bold"))+
theme(panel.background = element_rect(fill = 'gray70'))+
theme(plot.title = element_text(size=26))
Here is the head of the dataset for reproducing the problem:
ID kilo zone_sm
185 12.522931 NW
234 12.702159 NW
25315 1.939652 NE
25411 1.938117 NE
25507 1.936778 NE
25603 1.935634 NE
As requested here is the dput(hist(ln_jan$kilo)):
structure(list(breaks = c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26), counts = c(6079L, 8898L, 4240L, 2414L, 1677L, 986L, 760L, 609L, 394L, 639L, 338L, 53L, 14L), density = c(0.112154533043061, 0.1641636839969, 0.078225895723405, 0.0445371019519575, 0.0309398177189034, 0.0181912106564333, 0.0140216228183462, 0.0112357477583853, 0.00726910446109, 0.0117892328696358, 0.00623593225342238, 0.000977823696542563, 0.000258293051916903), mids = c(1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25), xname = "ln_jan$kilo", equidist = TRUE), class = "histogram")
Here is the length(which(ln_jan$kilo>25)) result: 1. The value of this one is 25.01803... I rounded a little
Upvotes: 2
Views: 1088
Reputation: 3542
The issue that you're seeing is that specifying xlim()
means that ggplot has to show the frequency of observations in the bins up to 30, which means you get bins with 0 frequency up to that limit - the thickness of your line is making it way more obvious than it would otherwise be. You can use coord_cartesian()
instead of xlim()
to show the x axis you want, and leave the styling of your graph the same. Here is an example using a recreated data set:
set.seed(1)
df <- data.frame(x=exp(rnorm(100)))
p <- ggplot(df, aes(x)) +
geom_histogram(fill="transparent", colour="black", size=2)
p + xlim(0,15)
p + coord_cartesian(xlim=c(0,15)) #<-- this figure shown
Updated code for your plot would look like:
ggplot(ln_jan, aes(x=kilo, color=zone_sm)) +
geom_histogram(fill="black", alpha=.8, position="identity", size =1.15)+
coord_cartesian(xlim=c(0, 30), ylim=c(0,4000)) +
ggtitle("Road lengths")+
ylab("Frequency")+
xlab("Distance (km)")+
theme(plot.title = element_text(hjust = 0.5, size = 21, face = "bold"))+
scale_color_discrete(name = "road types",
labels=c("highways", "small roads"))+
theme(axis.text=element_text(size=10, face = "bold"),
axis.title=element_text(size=14,face="bold"))+
theme(panel.background = element_rect(fill = 'gray70'))+
theme(plot.title = element_text(size=26))
Upvotes: 1