Reputation: 39
I have used the code below to make the ggplot added as an image further down. The plot is a duration curve showing water discharge on the y-axis, and percentage of time on the x-axis. The lines represents one singular year of water discharge measurements, and in total there are 20 years = 20 lines. I want to use gghighlight to highlight the average water discharge over time. How can I add the average water discharge?
sy2.1 %>%
group_by(year(t)) %>%
arrange(desc(WaterDis)) %>%
mutate(t3 = 1:n()/n()*100) %>%
ggplot(aes(t3, WaterDis, colour=year(t),
group=year(t))) +
geom_line(size=1) +
scale_y_continuous(expand=c(0, 0)) +
scale_x_continuous(expand=c(0.001, 0)) +
labs(x="% of time", y="Water discharge (m3/s)", colour="Year") +
theme_classic()
Upvotes: 1
Views: 3744
Reputation: 247
First calculate the average water discharge for each year, and then highlight the points. This is a sample code, I would go with... make changes accordingly.
library(ggplot2)
library(gghighlight)
sy2.1_avg <- sy2.1 %>%
group_by(year(t)) %>%
summarize(avg_discharge = mean(WaterDis))
sy2.1 %>%
ggplot(aes(x = 1:n()/n()*100, y = WaterDis, colour = year(t), group = year(t))) +
geom_line(size = 1) +
scale_y_continuous(expand = c(0, 0)) +
scale_x_continuous(expand = c(0.001, 0)) +
labs(x = "% of time", y = "Water discharge (m3/s)", colour = "Year") +
theme_classic() +
gghighlight(sy2.1_avg, label_key = "Average", label_params = list(color = "red"))
Here, with a red label highlight the points or line corresponding to the average discharge. And sy2.1_avg
is nothing but the average water discharge per year.
Upvotes: 0
Reputation: 13793
You can either summarize first from your data and then plot the summarized data, or you can summarize directly within your plot code using stat_summary()
. I'll show you the latter method below with an example dataset.
Here's the data and basic plot.
library(ggplot2)
set.seed(1234)
df <- data.frame(
x=rep(1:100, 3),
y=c(log(1:100)+rnorm(1, 0.02, 1), log(1:100)+rnorm(1, 0.02, 0.01), log(1:100)+rnorm(1, -0.04, 0.01)),
category=rep(LETTERS[1:3], each=100)
)
p <- ggplot(df, aes(x,y,color=category)) +
geom_line(size=0.5) +
theme_classic()
p
To find the average of the lines, you can use stat_summary()
and tell it to use the mean()
function.
p + stat_summary(geom="line", fun = "mean", color="black", size=1, linetype="dashed")
Personally, I use both methods (that shown here or summarizing before), depending on the situation.
As a final note, your coloring scheme for each line is on a continuous scale, but the data really should be segmented in your example. I would force ggplot2
to treat your lines as a factor via referencing as.factor(year(t))
or factor(year(t))
instead of year(t)
.
Upvotes: 5