melbez
melbez

Reputation: 1000

Vertical line between bins on histogram in ggplot

I would like to be able to add a vertical line at exactly 28.5, between bars 26.5 and 30.5. This is the graph I have so far. How can I add a line to this?

enter image description here

The data necessary to generate this is a single vector with values from 0 to 76.5. This is then broken into bins, as shown below. The purpose of this histogram is to show the number of items in each bin.

This is the code I am currently using. The last line of the code is my attempt to add the vertical line, but it does not work. To plot this, I used the instructions here.

breaks <- c(0, 0.5, 4.5, 8.5, 12.5, 16.5, 20.5, 24.5, 28.5, 32.5, 36.5, 40.5, 44.5, 
        48.5, 52.5, 56.5, 60.5, 64.5, 68.5, 72.5, 76.5)
tags <- c(0, 2.5, 6.5, 10.5, 14.5, 18.5, 22.5, 26.5, 30.5, 34.5, 38.5, 42.5, 46.5, 
      50.5, 54.5, 58.5, 62.5, 66.5, 70.5, 74.5)
group_tags <- cut(X2miledata_2020$hrs_82, breaks = breaks, include.lowest = TRUE, 
right = FALSE, labels = tags)
summary(group_tags)

ggplot(data = as_tibble(group_tags), mapping = aes(x = value)) + 
  geom_bar(fill = "bisque", color = "white", alpha = 0.7) +
  stat_count(geom="text", 
aes(label=sprintf("%.2f",..count../length(group_tags))), vjust=0) +
  labs(x='HRS scores') +
  theme_minimal() + 
  geom_vline(xintercept = 28.5)

Upvotes: 1

Views: 1355

Answers (1)

dc37
dc37

Reputation: 16178

On your dataset, 28.5 value is not between 26.5 and 30.5 because if you take a look about your cut function as you pass include.lowest = TRUE, you will have the value 28.5 being counted as part of the group "30.5".

Here, an example:

df <- data.frame(x = rnorm(100, mean = 38.5, sd = 10))

library(dplyr)

df %>% add_row(x = 28.5) %>%
  mutate(group_tags = cut(x, breaks = breaks, include.lowest = TRUE, 
                          right = FALSE, labels = tags)) %>%
  filter(x == 28.5)

     x group_tags
1 28.5       30.5

So, you have two options depending of if you want to draw a line at the exact value of 28.5 (so the group "30.5") or between 26.5 and 30.5.

For the first option, you just need to create a second dataset with this particular value as above and use geom_segment to draw a line at the location of the corresponding group_tags for a value of 28.5. On the code below, I draw this option as a "red" line.

For the second one, you can manually count the number of bars for 26.5 and 30.5 and set the geom_vline as this value. For each bars, you count one unit starting from the left. In my example, I have 13 different bars, and 26.5 is the 4th and 30.5 the 5th, so I place the geom_vline at 4.5 (blue line). On your example, geom_vline(xintercept = 8.5) should work.

Here the code to generated the graph below:

library(dplyr)

DF <- df %>% mutate(group_tags = cut(x, breaks = breaks, include.lowest = TRUE, 
                          right = FALSE, labels = tags)) 

gv <- df %>% add_row(x = 28.5) %>%
  mutate(group_tags = cut(x, breaks = breaks, include.lowest = TRUE, 
                          right = FALSE, labels = tags)) %>%
  filter(x == 28.5)

library(ggplot2)

ggplot(DF, aes(x = as.character(group_tags)))+
  geom_bar(fill = "bisque", color = "white", alpha = 0.7)+
  geom_segment(data = gv, 
             aes(x = group_tags, xend = group_tags, 
                 y = -Inf, yend = Inf,group = 1),color = "red" )+
  geom_vline(xintercept = 4.5, color = "blue")+
  stat_count(geom="text", 
             aes(label=sprintf("%.2f",..count../length(DF$group_tags))), 
             vjust=0) +
  labs(x='HRS scores') +
  theme_minimal() 

enter image description here

Does it answer your question ?

Upvotes: 2

Related Questions