Reputation: 3022
I have some data in a data frame, it looks like this (head) in data frame, df:
site year date value
1 MLO 1969 1969-08-20 323.95
2 MLO 1969 1969-08-27 324.58
3 MLO 1969 1969-09-02 321.61
4 MLO 1969 1969-09-12 321.15
5 MLO 1969 1969-09-24 321.15
6 MLO 1969 1969-10-03 320.54
I am using aggregate() to find the max value by year:
ag <- aggregate(df$value ~ df$year, data=df, max)
This works great, and I have the following (head) in ag:
df$year df$value
1 1969 324.58
2 1970 331.16
3 1971 325.89
4 1974 336.75
5 1976 333.87
6 1977 338.63
However, I'd like to plot the original data and then layer on the data from the aggregate and in order to do that I need a column with the full date field (the one that matches the maximum value) in the aggregate. In other words, I'd need each vector in the aggregate to look like:
df$date df$year df$value
1 1969-08-27 1969 324.58
and so on, so I can geom_point like so:
sp <- ggplot(df, aes(x=date, y=value)) +
labs(x="Year", y="Value")
sp + geom_point(colour="grey60", size=1) +
geom_point(data=ag, aes(x=`df$date`,
y=`df$value`))
Is this possible with aggregate? That is, can I compute the max aggregate values using year, but then have it add on the date field from the matching row in the data frame?
Thank you!!
Upvotes: 2
Views: 4378
Reputation: 2920
You can use base::merge() to assign the df$date
whose value
is shared in both df
and agg
by way of an inner-join. To not grab all of the variables within df
, I limit it to only include the date
and value
columns.
# load data
df <-
read.table(
text = "site year date value
MLO 1969 1969-08-20 323.95
MLO 1969 1969-08-27 324.58
MLO 1969 1969-09-02 321.61
MLO 1969 1969-09-12 321.15
MLO 1969 1969-09-24 321.15
MLO 1969 1969-10-03 320.54"
, header = TRUE
, stringsAsFactors = FALSE )
# calculate max value by year
ag <- aggregate( formula = value ~ year, data = df, FUN = max )
# grab the date from df that matches
# the value from agg
ag <-
merge( x = ag
, y = df[c("date", "value")]
, by = "value"
, all = FALSE ) # to indicate that an inner-join be performed
# view results
ag
# value year date
# 1 324.58 1969 1969-08-27
# end of script #
Upvotes: 2
Reputation: 33772
You could use dplyr::mutate
instead of aggregate
to create a new column with the maximum values by year. Then you can map separate geoms to the original variable and the new column. I'd indicate the aggregate using a coloured line.
Using example data with 2 years:
df1 <- structure(list(site = c("MLO", "MLO", "MLO", "MLO", "MLO", "MLO"),
year = c(1970, 1970, 1970, 1969, 1969, 1969),
date = c("1970-08-20", "1970-08-27", "1970-09-02",
"1969-09-12", "1969-09-24", "1969-10-03"),
value = c(323.95, 324.58, 321.61, 321.15, 321.15, 320.54)),
class = "data.frame",
.Names = c("site", "year", "date", "value"),
row.names = c(NA, -6L))
library(tidyverse)
df1 %>%
group_by(year) %>%
mutate(maxVal = max(value)) %>%
ungroup() %>%
ggplot() +
geom_point(aes(date, value)) +
geom_line(aes(date, maxVal, group = year), color = "red")
There's probably a clever way to do it using stat_summary
too.
Upvotes: 0
Reputation: 2688
Solution using dplyr
and made up data
library(dplyr)
df <- data.frame(year = c(1969, 1969, 1969, 1970, 1970), date = c("1969-08-20", "1969-08-21", "1969-08-22", "1970-08-20", "1969-08-21"),
value = c(1,3,2, 10, 8))
df %>% group_by(year) %>% summarise(max_val = max(value),
max_date = date[which.max(value)])
# A tibble: 2 x 3
year max_val max_date
<dbl> <dbl> <chr>
1 1969. 3. 1969-08-21
2 1970. 10. 1970-08-20
Upvotes: 2