Andrew
Andrew

Reputation: 688

Find the maximum value within a range of dates with dplyr in R

This may sound like a silly question but I couldn't find an answer online. I have a large dataset which looks like this:

set.seed(1)
df <- data.frame(date = as.Date("2010-01-01")+seq(0,729), value = rnorm(730)) 

I would like to use dplyr to find the rolling maximum within a 1-year window (plus / minus) from each date. For instance, for the date "2010-05-01", I would like to find the maximum within "2009-05-01" and "2011-05-01". The max_value should be in a new column of df. Thank you.

Upvotes: 1

Views: 1453

Answers (4)

Edo
Edo

Reputation: 7858

slider is part of the tidyverse.

Try with this:

library(dplyr)
library(slider)
df %>% mutate(two_years_max = slide_index_dbl(value, date, max, .before = 365, .after = 365))

Upvotes: 4

Allan Cameron
Allan Cameron

Reputation: 174616

You could use zoo::rollmax:

df %>% 
  mutate(rollmax = zoo::rollmax(value, 720, align = "center", 
                                fill = c("extend", NA, "extend")))

You can change the fill argument to decide what you want to do in the first and last year worth of data when you don't have a full rolling two years to max. (In your example you only have two years, so there is only a single day that meets the "one year on each side" stipulation)

Upvotes: 1

Mitchell Graham
Mitchell Graham

Reputation: 155

Not the most elegant, but it'll work.

set.seed(1)
df <- data.frame(date = as.Date("2010-01-01")+seq(0,729), value = rnorm(730)) 

for(i in 1:nrow(df)){
  tmp <- df %>%
    filter(date >= df[i,'date'] - 365, date <= df[i,'date'] + 365) %>% 
    summarise(value = max(value)) %>%
    pull()
  
  df[i,'max_value'] <- tmp
}

        date      value max_value
1 2010-01-01 -0.6264538  2.649167
2 2010-01-02  0.1836433  2.649167
3 2010-01-03 -0.8356286  2.649167
4 2010-01-04  1.5952808  2.649167
5 2010-01-05  0.3295078  2.649167
6 2010-01-06 -0.8204684  2.649167

Upvotes: 1

Ryan John
Ryan John

Reputation: 1430

This may be what you're looking for:

library(tidyverse)
set.seed(1)
df <- data.frame(date = as.Date("2010-01-01")+seq(0,729), value = rnorm(730)) 

df %>% as_tibble() %>%
  dplyr::mutate(previous = max(lag(value, order_by = date,n = 365), na.rm = T),
                nexts = max(lead(value, order_by = date,n = 365), na.rm = T),
                max_value = max(previous, nexts)) 
#> # A tibble: 730 x 5
#>    date        value previous nexts max_value
#>    <date>      <dbl>    <dbl> <dbl>     <dbl>
#>  1 2010-01-01 -0.626     2.65  3.81      3.81
#>  2 2010-01-02  0.184     2.65  3.81      3.81
#>  3 2010-01-03 -0.836     2.65  3.81      3.81
#>  4 2010-01-04  1.60      2.65  3.81      3.81
#>  5 2010-01-05  0.330     2.65  3.81      3.81
#>  6 2010-01-06 -0.820     2.65  3.81      3.81
#>  7 2010-01-07  0.487     2.65  3.81      3.81
#>  8 2010-01-08  0.738     2.65  3.81      3.81
#>  9 2010-01-09  0.576     2.65  3.81      3.81
#> 10 2010-01-10 -0.305     2.65  3.81      3.81
#> # ... with 720 more rows

Created on 2020-08-20 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions