Reputation: 3805
dat <- structure(list(yearRef = c(1970, 1971, 1972, 1973, 1974, 1975,
1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986,
1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008,
2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018),
value = c(0.761253538863966, 0.778365700864592, 0.748473422160476,
0.790408287413012, 0.726707786670043, 0.80587461240495, 0.81582881742434,
0.914998995290579, 0.903241004636529, 0.883446087736501,
0.878399385374308, 0.790239960507709, 0.853841173129717,
0.972923769177295, 0.899133969911117, 0.865840008976815,
0.85942147306247, 0.9471790327507, 0.905362802563981, 0.91644169495142,
0.985789564141214, 0.978212191208007, 0.885157529562834,
1.01638026873823, 1.02702020472382, 0.944421276774342, 0.979439113456467,
0.951183598644539, 1.12054063623421, 1.00767230122493, 1.02132151007705,
0.95649988168142, 0.928385199359045, 1.05071183719421, 1.11654102944792,
0.910601547182633, 0.936460862711605, 1.2398210426787, 0.979036947391532,
1.09931214756341, 1.12206830109171, 0.997384903912461, 1.07413151131128,
0.967026290186151, 1.04921352764649, 1.08746580600605, 1.02444885186573,
1.14604631626466, 1.06449109417896)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -49L))
For each year, I want to calculate the mean of top 5 values from the previous 7 values. For e.g. the first mean value will be for 1977 and will consist of mean of best 5 years from 1970 till 1976. Similarly, for 1978, mean value will be the top 5 values from 1971-1977. Similarly, for 2018, the mean value will be top 5 values from 2011 - 2017
I have the following code from SO which sort of does the job.
library(data.table)
library(zoo)
setDT(dat)
dat[, mean.val:= if (.N > 6)
rollapplyr(value, 7,function(x) mean(tail(sort(x), 5)), fill = NA)
else mean(value)]
though the first value in the new column mean.val
is correct, it should be assigned to the row which has
1977 but has been assigned to 1976.
Upvotes: 0
Views: 298
Reputation: 270248
You want to process the PRIOR 7 points rather than the 7 points that end at the current point. To do that use a width of list(-(1:7))
. That says to use offsets -1 through -7 when processing the data. See ?rollapply
for more information on specifying the width
argument.
This (1) more directly specifies the intention making it easier to comprehend than approaches which require ignoring the required offsets and then fixing it up later and (2) uses only the packages you are already using (3) expresses the solution compactly and (4) preserves your solution changing only one argument.
dat[, mean.val:= if (.N > 6)
rollapply(value, list(-(1:7)), function(x) mean(tail(sort(x), 5)), fill = NA)
else mean(value)]
Upvotes: 3
Reputation: 4150
I think you can use the excellent tsibble package for an amazing rolling function and then you can use the lead function to displace the results
library(tidyverse)
dat <- structure(list(yearRef = c(1970, 1971, 1972, 1973, 1974, 1975,
1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986,
1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008,
2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018),
value = c(0.761253538863966, 0.778365700864592, 0.748473422160476,
0.790408287413012, 0.726707786670043, 0.80587461240495, 0.81582881742434,
0.914998995290579, 0.903241004636529, 0.883446087736501,
0.878399385374308, 0.790239960507709, 0.853841173129717,
0.972923769177295, 0.899133969911117, 0.865840008976815,
0.85942147306247, 0.9471790327507, 0.905362802563981, 0.91644169495142,
0.985789564141214, 0.978212191208007, 0.885157529562834,
1.01638026873823, 1.02702020472382, 0.944421276774342, 0.979439113456467,
0.951183598644539, 1.12054063623421, 1.00767230122493, 1.02132151007705,
0.95649988168142, 0.928385199359045, 1.05071183719421, 1.11654102944792,
0.910601547182633, 0.936460862711605, 1.2398210426787, 0.979036947391532,
1.09931214756341, 1.12206830109171, 0.997384903912461, 1.07413151131128,
0.967026290186151, 1.04921352764649, 1.08746580600605, 1.02444885186573,
1.14604631626466, 1.06449109417896)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -49L))
complex_function <- . %>%
sort %>%
tail(.,5) %>%
mean
dat %>%
mutate(roll_avg = tsibble::slide_dbl(.x = value,.f = complex_function,.size = 7),
roll_avg2 = lag(roll_avg))
#> # A tibble: 49 x 4
#> yearRef value roll_avg roll_avg2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1970 0.761 NA NA
#> 2 1971 0.778 NA NA
#> 3 1972 0.748 NA NA
#> 4 1973 0.790 NA NA
#> 5 1974 0.727 NA NA
#> 6 1975 0.806 NA NA
#> 7 1976 0.816 0.790 NA
#> 8 1977 0.915 0.821 0.790
#> 9 1978 0.903 0.846 0.821
#> 10 1979 0.883 0.865 0.846
#> # … with 39 more rows
Created on 2020-01-14 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 444
This simple for loop solve the problem:
dat$mean.val = NA
for(i in 8:nrow(dat))
{
dat$mean.val[i] = mean(sort(dat$value[(i-7):(i-1)],decreasing = TRUE)[1:5])
}
Upvotes: 0
Reputation: 28705
If the only issue is that the values should be shifted down 1 row, you can use shift
to fix this.
dat[, mean.val := shift(mean.val)]
FYI if you're on version >= 1.12.4 data.table you don't need zoo and can use data.table::frollapply
.
dat[, mean.val2 :=
shift(frollapply(value, 7, function(x) mean(tail(sort(x), 5))))]
dat[, all.equal(mean.val, mean.val2)] #TRUE
Upvotes: 2