Reputation: 3032

Add column to aggregate from data frame in R

I have some data in a data frame, it looks like this (head) in data frame, df:

  site year       date  value
1  MLO 1969 1969-08-20 323.95
2  MLO 1969 1969-08-27 324.58
3  MLO 1969 1969-09-02 321.61
4  MLO 1969 1969-09-12 321.15
5  MLO 1969 1969-09-24 321.15
6  MLO 1969 1969-10-03 320.54

I am using aggregate() to find the max value by year:

ag <- aggregate(df$value ~ df$year, data=df, max)

This works great, and I have the following (head) in ag:

       df$year      df$value
1         1969        324.58
2         1970        331.16
3         1971        325.89
4         1974        336.75
5         1976        333.87
6         1977        338.63

However, I'd like to plot the original data and then layer on the data from the aggregate and in order to do that I need a column with the full date field (the one that matches the maximum value) in the aggregate. In other words, I'd need each vector in the aggregate to look like:

          df$date df$year  df$value
1      1969-08-27    1969    324.58

and so on, so I can geom_point like so:

sp <- ggplot(df, aes(x=date, y=value)) +
  labs(x="Year", y="Value") 
sp + geom_point(colour="grey60", size=1) +
     geom_point(data=ag, aes(x=`df$date`, 
                             y=`df$value`))

Is this possible with aggregate? That is, can I compute the max aggregate values using year, but then have it add on the date field from the matching row in the data frame?

Thank you!!

Upvotes: 2

Answers (3)

Cristian E. Nuno

Reputation: 2920

Overview

You can use base::merge() to assign the df$date whose value is shared in both df and agg by way of an inner-join. To not grab all of the variables within df, I limit it to only include the date and value columns.

# load data
df <-
  read.table(
    text = "site year       date  value
      MLO 1969 1969-08-20 323.95
      MLO 1969 1969-08-27 324.58
      MLO 1969 1969-09-02 321.61
      MLO 1969 1969-09-12 321.15
      MLO 1969 1969-09-24 321.15
      MLO 1969 1969-10-03 320.54"
    , header = TRUE
    , stringsAsFactors = FALSE )

# calculate max value by year
ag <- aggregate( formula = value ~ year, data = df, FUN = max )

# grab the date from df that matches
# the value from agg
ag <-
  merge( x = ag
         , y = df[c("date", "value")]
         , by = "value"
         , all = FALSE ) # to indicate that an inner-join be performed

# view results
ag
# value year       date
# 1 324.58 1969 1969-08-27

# end of script #

Upvotes: 2

neilfws

Reputation: 33802

You could use dplyr::mutate instead of aggregate to create a new column with the maximum values by year. Then you can map separate geoms to the original variable and the new column. I'd indicate the aggregate using a coloured line.

Using example data with 2 years:

df1 <- structure(list(site = c("MLO", "MLO", "MLO", "MLO", "MLO", "MLO"),
                      year = c(1970, 1970, 1970, 1969, 1969, 1969),
                      date = c("1970-08-20", "1970-08-27", "1970-09-02",
                               "1969-09-12", "1969-09-24", "1969-10-03"),
                      value = c(323.95, 324.58, 321.61, 321.15, 321.15, 320.54)),
                      class = "data.frame",
                      .Names = c("site", "year", "date", "value"), 
                      row.names = c(NA, -6L))

library(tidyverse)
df1 %>% 
  group_by(year) %>% 
  mutate(maxVal = max(value)) %>% 
  ungroup() %>% 
  ggplot() + 
    geom_point(aes(date, value)) + 
    geom_line(aes(date, maxVal, group = year), color = "red")

There's probably a clever way to do it using stat_summary too.

Upvotes: 0

astrofunkswag

Reputation: 2718

Solution using dplyr and made up data

library(dplyr)
df <- data.frame(year = c(1969, 1969, 1969, 1970, 1970), date = c("1969-08-20", "1969-08-21", "1969-08-22", "1970-08-20", "1969-08-21"), 
                 value = c(1,3,2, 10, 8))

df %>% group_by(year) %>% summarise(max_val = max(value),
                                    max_date = date[which.max(value)])
# A tibble: 2 x 3
   year max_val max_date  
  <dbl>   <dbl> <chr>     
1 1969.      3. 1969-08-21
2 1970.     10. 1970-08-20

Upvotes: 2

Add column to aggregate from data frame in R

Answers (3)

Overview

Related Questions