Reputation: 167
I am trying to sum rows that are withing last 12 months period.
df<- read.table(header=T, text='yyyymm amount
200809 261900
200810 149000
200908 120000
201104 81500
201107 30000
201112 75000
201306 56000
201310 98000
201311 40000
201402 95000
201408 28900
201505 33000
201511 65000
201601 78000
201610 50000
201701 50000
201711 81500')
For example, for row one(200809==Sept, 2008), I don't have any previous data that are within past 12 months, so the new variable I want to create (sum_amt) ends up in sum_amt=NA. For row two, I have one record that is within past 12 months of 200810 (i.e. between 200710 & 200809), so sum_amt=261999. Similarly, for row three, I have two records that are within 12 months of 200908, so sum_amt=410900 (261900 + 261900) and so on.
Final results would be:
yyyymm amount sum_amt
200809 261900 NA
200810 261900 261900
200908 120000 410900
201104 81500 NA
201107 30000 81500
201112 75000 111500
201306 56000 NA
201310 98000 56000
201311 40000 154000
201402 95000 194000
201408 28900 233000
201505 33000 28900
201511 65000 33000
201601 78000 98000
201610 50000 143000
201701 50000 128000
201711 81500 50000
Thank you in advanced!!
Upvotes: 1
Views: 568
Reputation: 270348
Define a function sumfun which accepts a zoo series with yearmon time index. yearmon class internally represents a year and month as year + 0 for Jan, year + 1/12 for Feb and so on so subtracting 1 from a yearmon object gives one year back. sumfun returns the sum of the amounts in the last year or NA if there are no points before the current point. Then read df
into zoo series amount
and perform a rolling sum over it using sumfun
. Finally convert back to data frame.
library(zoo)
sumfun <- function(x) {
y <- x[time(x) < end(x) & time(x) >= end(x) - 1]
if (length(y) == 0) NA else sum(y)
}
amount <- read.zoo(df, FUN = function(x) as.yearmon(paste(x), "%Y%m"))
sum12 <- rollapplyr(amount, 13, sumfun, partial = TRUE, fill = NA, coredata = FALSE)
fortify.zoo(cbind(amount, sum12), name = names(df)[1])
giving:
yyyymm amount sum12
1 Sep 2008 261900 NA
2 Oct 2008 149000 261900
3 Aug 2009 120000 410900
4 Apr 2011 81500 NA
5 Jul 2011 30000 81500
6 Dec 2011 75000 111500
7 Jun 2013 56000 NA
8 Oct 2013 98000 56000
9 Nov 2013 40000 154000
10 Feb 2014 95000 194000
11 Aug 2014 28900 233000
12 May 2015 33000 28900
13 Nov 2015 65000 33000
14 Jan 2016 78000 98000
15 Oct 2016 50000 143000
16 Jan 2017 50000 128000
17 Nov 2017 81500 50000
Upvotes: 2
Reputation: 4480
This is my proposal:
df<- read.table(header=T, text='yyyymm amount
200809 261900
200810 149000
200908 120000
201104 81500
201107 30000
201112 75000
201306 56000
201310 98000
201311 40000
201402 95000
201408 28900
201505 33000
201511 65000
201601 78000
201610 50000
201701 50000
201711 81500')
df <- as.data.table(df)
df[, yyyymmdd:=ymd(paste0(yyyymm,"01"))]
sum_year <- function(origin){
sum(df[(origin-yyyymmdd)<=366 &(origin-yyyymmdd)>0]$amount)
}
df[, sum_amut:=mapply(sum_year, yyyymmdd)]
This will work under two assumptions:
And, finally, it returns 0 instead of NA when no date last 12 months is found.
Best
Upvotes: 1