Magasinus
Magasinus

Reputation: 83

R - Cumsum of product over rolling windows (quarters)

I want to calculate the cumulative sum of the column value with a special requirement. After a quarter (= 3 months) has passed, the value is depreciated by a constant, say 0.9. That means, it now factors into the cumsum only by 90% of its original value. When another quarter has passed, this is repeated, i.e. the 90% are multiplied by the constant (0.9) again (this equals the original value * 0.9^2). For every quarter passing, this procedure is repeated so after a year 0.9^4 of the original value run into the cumsum, after two years, 0.9^8, and so on. The full original value only factors fully into the cumsum in the first quarter following its date variable.

Example: a value on 01/15 of 100 would add to the cumsum with 100 until 04/15, with 100*0.9=90 between 04/15 and 07/15, with 100*0.9^2=81 between 07/15 and 10/15, and so on, see desired_outputcolumn of the MRE.

MRE:

df <- tribble(~date, ~value,~country,~desired_output,
              "2017-01-01", 2, "US", 2,
              "2017-01-05", 2, "UK", 8,
              "2017-01-05", 4, "US", 8,
              "2017-04-01", 5, "IT", 12.8,  # only the first observation is older than a quarter so calculate as (2*0.9+2+4+5)
              "2017-04-03", 3, "US", 115.8, #(2*0.9+2+4+5+3+100)
              "2017-04-03", 100, "US", 115.8,
              "2017-04-11", 20, "UK", 135.2,  # now the first three observations are older than a quarter (8*0.9+5+3+100+20)
              "2017-04-15", 6, "US", 141.2,
              "2017-07-02", 30, "US", 170.52,  # now the first observation is older than two quarters and the second, third, and forth observation are older than a quarter (2*0.9^2+11*0.9+3+100+20+6+30)
              "2017-10-12", 4, "UK", 151.912,  # ((2+2+4)*0.9^3+(5+3+100+20)*0.9^2+(6+30)*0.9+4+6)
              "2017-10-12", 6, "IT", 151.912) # (8*0.9^3+128*0.9^2+36*0.9+10)

assuming the constant has a value of 0.9 and the format of the date is %Y-%b-%d.


Pseudo-Code to further clarify the questions:

  1. Take cumsum of each quarter/3 month period (from the date of current row)

  2. Exponentiate constant by distance of this period to the current row date (e.g. C^2 if quarter is two quarters back from today)

  3. Multiply this adjusted constant by the cumsum of the respective quarter (3-month period)

  4. Take cumsum of all quarter cumsums


In a next step, I would want to take this specific cumsum also by a grouping variable (e.g. country).

Upvotes: 1

Views: 242

Answers (1)

chinsoon12
chinsoon12

Reputation: 25223

Here is an option:

DT[, do := 
    .SD[.SD, on=.(date<=date), by=.EACHI, {
        nqtr <- floor(pmax(0, i.date - x.date) / 90)
        sum(value * 0.9^nqtr)
    }]$V1
]

output:

          date value country desired_output      do
 1: 2017-01-01     2      US          2.000   2.000
 2: 2017-01-05     2      UK          8.000   8.000
 3: 2017-01-05     4      US          8.000   8.000
 4: 2017-04-01     5      IT         12.800  12.800
 5: 2017-04-03     3      US        115.800 115.800
 6: 2017-04-03   100      US        115.800 115.800
 7: 2017-04-11    20      UK        135.200 135.200
 8: 2017-04-15     6      US        141.200 141.200
 9: 2017-07-02    30      US        170.520 160.220
10: 2017-10-12     4      UK        151.912 151.372
11: 2017-10-12     6      IT        151.912 151.372

The difference is in how we define a quarter. I used 90d. If 3m is really important, I will update the post. For example on 2017-07-02, rows 2 - 6 are 1 qtr ago when using 90days whereas in your OP, only rows 2 - 4 are in a qtr ago when using 3m.

data:

library(data.table)    
DT <- fread('date,value,country,desired_output
"2017-01-01", 2, "US", 2
"2017-01-05", 2, "UK", 8
"2017-01-05", 4, "US", 8
"2017-04-01", 5, "IT", 12.8  
"2017-04-03", 3, "US", 115.8 
"2017-04-03", 100, "US", 115.8
"2017-04-11", 20, "UK", 135.2
"2017-04-15", 6, "US", 141.2
"2017-07-02", 30, "US", 170.52
"2017-10-12", 4, "UK", 151.912
"2017-10-12", 6, "IT", 151.912')
DT[, date := as.IDate(date, format="%Y-%m-%d")]

Handling 3m and country requirement:

DT[, do := 
    .SD[.SD, on=.(country, date<=date), by=.EACHI, {
        vec <- rev(seq(i.date, min(x.date)-93L, by="-1 quarter"))
        itvl <- findInterval(x.date, vec, rightmost.closed=TRUE)
        nqtr <- length(vec) - itvl - 1L
        sum(value * 0.9^nqtr)
    }]$V1
]

output:

          date value country desired_output      do
 1: 2017-01-01     2      US          2.000   2.000
 2: 2017-01-05     2      UK          8.000   8.000
 3: 2017-01-05     4      US          8.000   8.000
 4: 2017-04-01     5      IT         12.800  13.000
 5: 2017-04-03     3      US        115.800 115.800
 6: 2017-04-03   100      US        115.800 115.800
 7: 2017-04-11    20      UK        135.200 135.200
 8: 2017-04-15     6      US        141.200 141.200
 9: 2017-07-02    30      US        170.520 170.520
10: 2017-10-12     4      UK        151.912 151.912
11: 2017-10-12     6      IT        151.912 151.912

Upvotes: 2

Related Questions