Reputation: 747
I'm trying to create a function that sums the closest n values to a given date. So if I had 5 weeks of data, and n=2, the value on week 1 would be the sum of weeks 2&3, the value on week 2 would be the sum of weeks 1&3, etc. Example:
library(dplyr)
library(data.table)
Week <- 1:5
Sales <- c(1, 3, 5, 7, 9)
frame <- data.table(Week, Sales)
frame
Week Sales Recent
1: 1 1 8
2: 2 3 6
3: 3 5 10
4: 4 7 14
5: 5 9 12
I want to make a function that does this for me with an input for most recent n (not just 2), but for now I want to get 2 right. Here's my function using lag/lead:
RecentSum = function(Variable, Lags){
Sum = 0
for(i in 1:(Lags/2)){ #Lags/2 because I want half values before and half after
#Check to see if you can go backwards. If not, go foward (i.e. use lead).
if(is.na(lag(Variable, i))){
LoopSum = lead(Variable, i)
}
else{
LoopSum = lag(Variable, i)
}
Sum = Sum + LoopSum
}
for(i in 1:(Lags/2)){
if(is.na(lead(Variable, i))){ #Check to see if you can go forward. If not, go backwards (i.e. use lag).
LoopSum = lag(Variable, i)
}
else{
LoopSum = lead(Variable, i)
}
Sum = Sum + LoopSum
}
Sum
}
When I do RecentSum(frame$Sale,2)
I get [1] 6 10 14 18 NA
which is wrong for a number of reasons:
if
statements are only hitting on week one, so it will always be NA for lag and always be non-NA for lead.6
instead of 8
because the first for-loop sends it to lead(_,1), but then the second for-loop does the same. I can't think of how I'd make my second for-loop recognize this.Is there a function or library (Zoo
?) that makes this task easy? I'd like to get my own function to work for the sake of practice/understanding, but at this point I'd rather just get it done.
Thanks!
Upvotes: 1
Views: 842
Reputation: 269576
1) Assuming that k is even define to
as a vector of indices such that for each element of to
we sum the k+1 elements of Sales
that end in that index and from that subtract Sales
:
k <- 2 # number of elements to sum
n <- nrow(frame)
to <- pmax(k+1, pmin(1:n + k/2, n))
Sum <- function(to, Sales) sum(Sales[seq(to = to, length = k+1)])
frame %>% mutate(recent = sapply(to, Sum, Sales) - Sales)
giving:
Week Sales recent
1 1 1 8
2 2 3 6
3 3 5 10
4 4 7 14
5 5 9 12
Note that by replacing the last line of code above with the following line the solution can be done entirely in base R:
transform(frame, recent = sapply(to, Sum, Sales) - Sales)
2) This concatenates the appropriate elements before and after the Sales series so that an ordinary rolling sum gives the result.
library(zoo)
ix <- c(seq(to = k+1, length = k/2), 1:n, seq(to = n-k, length = k/2))
frame %>% mutate(recent = rollsum(Sales[ix], k+1) - Sales)
Note that if k=2 then it reduces this to this one-liner:
frame %>% mutate(recent = rollsum(Sales[c(3, 1:n(), n()-2)], 3) - Sales)
giving:
Week Sales recent
1 1 1 8
2 2 3 6
3 3 5 10
4 4 7 14
5 5 9 12
Update: fixed for k > 2
Upvotes: 1
Reputation: 2280
To elaborate on my comment, lead
and lag
are functions that are meant to be used within vectorized functions such as dplyr
. Here is a way to do it within dplyr
without using a function:
df <- tibble(week = Week, sales = Sales)
df %>%
mutate(recent = case_when(is.na(lag(sales)) ~ lead(sales, n = 1) + lead(sales, n = 2),
is.na(lead(sales)) ~ lag(sales, n = 1) + lag(sales, n = 2),
TRUE ~ lag(sales) + lead(sales)))
That gives you this:
# A tibble: 5 x 3
week sales recent
<int> <dbl> <dbl>
1 1 1 8
2 2 3 6
3 3 5 10
4 4 7 14
5 5 9 12
Upvotes: 4