Reputation: 577
I have a data frame with three columns: a factor (representing here a chapter in a book), a numerical ID (representing where the sentence occurs in the book), and a value (representing the number of words in the book). It looks something like this:
sentence.length
# A tibble: 5,368 x 3
Chapter ID Length
<fct> <dbl> <dbl>
1 1 1 294
2 1 2 19
3 1 3 77
4 1 4 57
5 1 5 18
6 1 6 18
7 1 7 27
8 1 8 56
9 1 9 32
10 1 10 25
# ... with 5,358 more rows
I have a plot that is very close to what I want.
ggplot(data,aes(x=ID,y=Length,fill=Chapter)) +
geom_bar(stat='identity')
What I'd like to add is, over every group, is a horizontal line representing the mean of that group.
This code, modified from another question, gets me close
stat_summary(fun.y = mean, aes(x = 1, yintercept = ..y.., group = Chapter), geom = "hline")
But the lines extend across the entire plot; is there a way to plot that mean line only over the relevant portion of the plot? I suspect the issue here is that my data happens to be ordered such that a group
corresponds to a continuous part of the plot; but there is nothing in the aesthetics of the plot itself to require this.
An even closer approach is to use not stat_summary
but geom_smooth
; geom_smooth(method='lm',se=FALSE)
gets me really close. But rather than a linear regression, I really just want the mean for the group (here, the per-chapter sentence length mean).
Is there a better/simpler approach?
Upvotes: 1
Views: 56
Reputation: 2399
I'm not sure if it's the simplest way to do this, but it works:
library(tidyverse)
library(wrapr)
df %.>%
ggplot(data = ., aes(
x = ID,
y = Length,
fill = Chapter
)) +
geom_col() +
geom_segment(data = group_by(., Chapter) %>%
summarise(
mean_len = mean(Length),
min_id = min(ID),
max_id = max(ID)
),
aes(
x = min_id,
xend = max_id,
y = mean_len,
yend = mean_len
),
color = 'steelblue',
size = 1.2
)
With %.>%
pipe you can pass down df
to summarise it in geom_segment
function. You can access df
after %.>%
by .
.
Upvotes: 1