Saron B
Saron B

Reputation: 39

Code for adding an average line in ggplot?

I have used the code below to make the ggplot added as an image further down. The plot is a duration curve showing water discharge on the y-axis, and percentage of time on the x-axis. The lines represents one singular year of water discharge measurements, and in total there are 20 years = 20 lines. I want to use gghighlight to highlight the average water discharge over time. How can I add the average water discharge?

sy2.1 %>% 
  group_by(year(t)) %>% 
  arrange(desc(WaterDis)) %>% 
  mutate(t3 = 1:n()/n()*100) %>% 
  ggplot(aes(t3, WaterDis, colour=year(t),
             group=year(t))) +
  geom_line(size=1) +
  scale_y_continuous(expand=c(0, 0)) +
  scale_x_continuous(expand=c(0.001, 0)) +
  labs(x="% of time", y="Water discharge (m3/s)", colour="Year") +
  theme_classic()

enter image description here

Upvotes: 1

Views: 3744

Answers (2)

joy
joy

Reputation: 247

First calculate the average water discharge for each year, and then highlight the points. This is a sample code, I would go with... make changes accordingly.

library(ggplot2)
library(gghighlight)

sy2.1_avg <- sy2.1 %>%
  group_by(year(t)) %>%
  summarize(avg_discharge = mean(WaterDis))

sy2.1 %>%
  ggplot(aes(x = 1:n()/n()*100, y = WaterDis, colour = year(t), group = year(t))) +
  geom_line(size = 1) +
  scale_y_continuous(expand = c(0, 0)) +
  scale_x_continuous(expand = c(0.001, 0)) +
  labs(x = "% of time", y = "Water discharge (m3/s)", colour = "Year") +
  theme_classic() +
  gghighlight(sy2.1_avg, label_key = "Average", label_params = list(color = "red"))

Here, with a red label highlight the points or line corresponding to the average discharge. And sy2.1_avg is nothing but the average water discharge per year.

Upvotes: 0

chemdork123
chemdork123

Reputation: 13793

You can either summarize first from your data and then plot the summarized data, or you can summarize directly within your plot code using stat_summary(). I'll show you the latter method below with an example dataset.

Here's the data and basic plot.

library(ggplot2)

set.seed(1234)

df <- data.frame(
  x=rep(1:100, 3),
  y=c(log(1:100)+rnorm(1, 0.02, 1), log(1:100)+rnorm(1, 0.02, 0.01), log(1:100)+rnorm(1, -0.04, 0.01)),
  category=rep(LETTERS[1:3], each=100)
)

p <- ggplot(df, aes(x,y,color=category)) +
  geom_line(size=0.5) +
  theme_classic()
p

enter image description here

To find the average of the lines, you can use stat_summary() and tell it to use the mean() function.

p + stat_summary(geom="line", fun = "mean", color="black", size=1, linetype="dashed")

enter image description here

Personally, I use both methods (that shown here or summarizing before), depending on the situation.

As a final note, your coloring scheme for each line is on a continuous scale, but the data really should be segmented in your example. I would force ggplot2 to treat your lines as a factor via referencing as.factor(year(t)) or factor(year(t)) instead of year(t).

Upvotes: 5

Related Questions