Reputation: 3805

conditional rolling average in R

    dat <- structure(list(yearRef = c(1970, 1971, 1972, 1973, 1974, 1975, 
    1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 
    1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 
    1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 
    2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018), 
    value = c(0.761253538863966, 0.778365700864592, 0.748473422160476, 
    0.790408287413012, 0.726707786670043, 0.80587461240495, 0.81582881742434, 
    0.914998995290579, 0.903241004636529, 0.883446087736501, 
    0.878399385374308, 0.790239960507709, 0.853841173129717, 
    0.972923769177295, 0.899133969911117, 0.865840008976815, 
    0.85942147306247, 0.9471790327507, 0.905362802563981, 0.91644169495142, 
    0.985789564141214, 0.978212191208007, 0.885157529562834, 
    1.01638026873823, 1.02702020472382, 0.944421276774342, 0.979439113456467, 
    0.951183598644539, 1.12054063623421, 1.00767230122493, 1.02132151007705, 
    0.95649988168142, 0.928385199359045, 1.05071183719421, 1.11654102944792, 
    0.910601547182633, 0.936460862711605, 1.2398210426787, 0.979036947391532, 
    1.09931214756341, 1.12206830109171, 0.997384903912461, 1.07413151131128, 
    0.967026290186151, 1.04921352764649, 1.08746580600605, 1.02444885186573, 
    1.14604631626466, 1.06449109417896)), class = c("tbl_df", 
    "tbl", "data.frame"), row.names = c(NA, -49L))

For each year, I want to calculate the mean of top 5 values from the previous 7 values. For e.g. the first mean value will be for 1977 and will consist of mean of best 5 years from 1970 till 1976. Similarly, for 1978, mean value will be the top 5 values from 1971-1977. Similarly, for 2018, the mean value will be top 5 values from 2011 - 2017

I have the following code from SO which sort of does the job.

  library(data.table)
  library(zoo)

  setDT(dat)

  dat[, mean.val:= if (.N > 6) 
        rollapplyr(value, 7,function(x) mean(tail(sort(x), 5)), fill = NA)  
        else mean(value)]

though the first value in the new column mean.val is correct, it should be assigned to the row which has 1977 but has been assigned to 1976.

Upvotes: 0

Answers (4)

G. Grothendieck

Reputation: 270248

You want to process the PRIOR 7 points rather than the 7 points that end at the current point. To do that use a width of list(-(1:7)). That says to use offsets -1 through -7 when processing the data. See ?rollapply for more information on specifying the width argument.

This (1) more directly specifies the intention making it easier to comprehend than approaches which require ignoring the required offsets and then fixing it up later and (2) uses only the packages you are already using (3) expresses the solution compactly and (4) preserves your solution changing only one argument.

  dat[, mean.val:= if (.N > 6) 
        rollapply(value, list(-(1:7)), function(x) mean(tail(sort(x), 5)), fill = NA)  
        else mean(value)]

Upvotes: 3

Bruno

Reputation: 4150

I think you can use the excellent tsibble package for an amazing rolling function and then you can use the lead function to displace the results

library(tidyverse)

dat <- structure(list(yearRef = c(1970, 1971, 1972, 1973, 1974, 1975, 
                                  1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 
                                  1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 
                                  1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 
                                  2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018), 
                      value = c(0.761253538863966, 0.778365700864592, 0.748473422160476, 
                                0.790408287413012, 0.726707786670043, 0.80587461240495, 0.81582881742434, 
                                0.914998995290579, 0.903241004636529, 0.883446087736501, 
                                0.878399385374308, 0.790239960507709, 0.853841173129717, 
                                0.972923769177295, 0.899133969911117, 0.865840008976815, 
                                0.85942147306247, 0.9471790327507, 0.905362802563981, 0.91644169495142, 
                                0.985789564141214, 0.978212191208007, 0.885157529562834, 
                                1.01638026873823, 1.02702020472382, 0.944421276774342, 0.979439113456467, 
                                0.951183598644539, 1.12054063623421, 1.00767230122493, 1.02132151007705, 
                                0.95649988168142, 0.928385199359045, 1.05071183719421, 1.11654102944792, 
                                0.910601547182633, 0.936460862711605, 1.2398210426787, 0.979036947391532, 
                                1.09931214756341, 1.12206830109171, 0.997384903912461, 1.07413151131128, 
                                0.967026290186151, 1.04921352764649, 1.08746580600605, 1.02444885186573, 
                                1.14604631626466, 1.06449109417896)), class = c("tbl_df", 
                                                                                "tbl", "data.frame"), row.names = c(NA, -49L))

complex_function <- . %>% 
  sort %>% 
  tail(.,5) %>% 
  mean

dat %>%
  mutate(roll_avg  = tsibble::slide_dbl(.x = value,.f = complex_function,.size = 7),
         roll_avg2 = lag(roll_avg))
#> # A tibble: 49 x 4
#>    yearRef value roll_avg roll_avg2
#>      <dbl> <dbl>    <dbl>     <dbl>
#>  1    1970 0.761   NA        NA    
#>  2    1971 0.778   NA        NA    
#>  3    1972 0.748   NA        NA    
#>  4    1973 0.790   NA        NA    
#>  5    1974 0.727   NA        NA    
#>  6    1975 0.806   NA        NA    
#>  7    1976 0.816    0.790    NA    
#>  8    1977 0.915    0.821     0.790
#>  9    1978 0.903    0.846     0.821
#> 10    1979 0.883    0.865     0.846
#> # … with 39 more rows

^{Created on 2020-01-14 by the reprex package (v0.3.0)}

Upvotes: 0

Filipe Lauar

Reputation: 444

This simple for loop solve the problem:

dat$mean.val = NA

for(i in 8:nrow(dat))
{
  dat$mean.val[i] = mean(sort(dat$value[(i-7):(i-1)],decreasing = TRUE)[1:5])
}

Upvotes: 0

IceCreamToucan

Reputation: 28705

If the only issue is that the values should be shifted down 1 row, you can use shift to fix this.

dat[, mean.val := shift(mean.val)]

FYI if you're on version >= 1.12.4 data.table you don't need zoo and can use data.table::frollapply.

dat[, mean.val2 := 
      shift(frollapply(value, 7, function(x) mean(tail(sort(x), 5))))]

dat[, all.equal(mean.val, mean.val2)] #TRUE

Upvotes: 2

conditional rolling average in R

Answers (4)

Related Questions