CoolGuyHasChillDay
CoolGuyHasChillDay

Reputation: 747

(dplyr) Sum of N values most recent to a date

I'm trying to create a function that sums the closest n values to a given date. So if I had 5 weeks of data, and n=2, the value on week 1 would be the sum of weeks 2&3, the value on week 2 would be the sum of weeks 1&3, etc. Example:

library(dplyr)
library(data.table)

Week <- 1:5
Sales <- c(1, 3, 5, 7, 9)

frame <- data.table(Week, Sales)
frame
   Week Sales  Recent
1:    1     1    8
2:    2     3    6
3:    3     5    10
4:    4     7    14
5:    5     9    12

I want to make a function that does this for me with an input for most recent n (not just 2), but for now I want to get 2 right. Here's my function using lag/lead:

RecentSum = function(Variable, Lags){
  Sum = 0
  for(i in 1:(Lags/2)){ #Lags/2 because I want half values before and half after
    #Check to see if you can go backwards. If not, go foward (i.e. use lead).
    if(is.na(lag(Variable, i))){
      LoopSum = lead(Variable, i)
    }
    else{
      LoopSum = lag(Variable, i)
    }
    Sum = Sum + LoopSum
  }
  for(i in 1:(Lags/2)){
    if(is.na(lead(Variable, i))){ #Check to see if you can go forward. If not, go backwards (i.e. use lag).
      LoopSum = lag(Variable, i)
    }
    else{
      LoopSum = lead(Variable, i)
    }
    Sum = Sum + LoopSum
  }
  Sum
}

When I do RecentSum(frame$Sale,2) I get [1] 6 10 14 18 NA which is wrong for a number of reasons:

  1. My if statements are only hitting on week one, so it will always be NA for lag and always be non-NA for lead.
  2. I need to have a way to see if it uses lag/lead the first time. The first value is 6 instead of 8 because the first for-loop sends it to lead(_,1), but then the second for-loop does the same. I can't think of how I'd make my second for-loop recognize this.

Is there a function or library (Zoo?) that makes this task easy? I'd like to get my own function to work for the sake of practice/understanding, but at this point I'd rather just get it done.

Thanks!

Upvotes: 1

Views: 842

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269576

1) Assuming that k is even define to as a vector of indices such that for each element of to we sum the k+1 elements of Sales that end in that index and from that subtract Sales:

k <- 2  # number of elements to sum
n <- nrow(frame)
to <- pmax(k+1, pmin(1:n + k/2, n))
Sum <- function(to, Sales) sum(Sales[seq(to = to, length = k+1)])
frame %>% mutate(recent = sapply(to, Sum, Sales) - Sales)

giving:

  Week Sales recent
1    1     1      8
2    2     3      6
3    3     5     10
4    4     7     14
5    5     9     12    

Note that by replacing the last line of code above with the following line the solution can be done entirely in base R:

transform(frame, recent = sapply(to, Sum, Sales) - Sales)

2) This concatenates the appropriate elements before and after the Sales series so that an ordinary rolling sum gives the result.

library(zoo)

ix <- c(seq(to = k+1, length = k/2), 1:n, seq(to = n-k, length = k/2))
frame %>% mutate(recent = rollsum(Sales[ix], k+1) - Sales)

Note that if k=2 then it reduces this to this one-liner:

frame %>% mutate(recent = rollsum(Sales[c(3, 1:n(), n()-2)], 3) - Sales)

giving:

  Week Sales recent
1    1     1      8
2    2     3      6
3    3     5     10
4    4     7     14
5    5     9     12

Update: fixed for k > 2

Upvotes: 1

tbradley
tbradley

Reputation: 2280

To elaborate on my comment, lead and lag are functions that are meant to be used within vectorized functions such as dplyr. Here is a way to do it within dplyr without using a function:

df <- tibble(week = Week, sales = Sales)

df %>%
  mutate(recent = case_when(is.na(lag(sales)) ~ lead(sales, n = 1) + lead(sales, n = 2),
                            is.na(lead(sales)) ~ lag(sales, n = 1) + lag(sales, n = 2),
                            TRUE ~ lag(sales) + lead(sales)))

That gives you this:

# A tibble: 5 x 3
   week sales recent
   <int> <dbl>  <dbl>
1     1     1      8
2     2     3      6
3     3     5     10
4     4     7     14
5     5     9     12

Upvotes: 4

Related Questions