vog
vog

Reputation: 836

Lagged rolling interval window in dplyr

I have a dataset like df, where year_month represents the year in the first 2 digits, and the month in the last 2 digits (1901=01/2019).

id <- c(1,1,1,1,1,2,2,2,2,2)
year_month <- c(1801,1809,1812,1901,1908, 1901,1909,1912,2011,2012 )
value <- c(10,20,30,40,50,60,70,80,90,100)

df <- data.frame(id, year_month, value)

I want to sum the accumulated value of each firm during the last year (e.g. if we are in 1901=01/2019, I want to start counting since 1801 (01/2018) and record the history of values prior to 1901, i.e. count from 1801 until 1812). The dataset will look like one_year_window

value_last_year <- c(NA,10,30,60,90,NA,60,130,80,170)

one_year_window <- data.frame(id, year_month, value, value_last_year)

I have tried several ways (cummsum()) in dplyr without success.

Any clue?

Upvotes: 1

Views: 230

Answers (1)

LMc
LMc

Reputation: 18692

df %>% 
  group_by(id, year = round(year_month, -2)) %>% 
  mutate(value_last_year = cumsum(value)) %>%
  ungroup(year) %>% 
  mutate(value_last_year = ifelse(year == max(year), value, NA)) %>% 
  ungroup() %>%
  select(-year)

Though it may be to your advantage to compute the cumulative sum across all years and then filter by id and year_month if you need their most recent records.

Output

      id year_month value value_last_year
   <dbl>      <dbl> <dbl>           <dbl>
 1     1       1801    10              NA
 2     1       1809    20              NA
 3     1       1812    30              NA
 4     1       1901    40              40
 5     1       1908    50              50
 6     2       1901    60              NA
 7     2       1909    70              NA
 8     2       1912    80              NA
 9     2       2011    90              90
10     2       2012   100             100

Update

Based on your updated post. I do not take credit for this answer. You should follow this link and upvote the answer there that this is based on:

library(dplyr)
library(lubridate)
library(zoo)

df %>% 
  mutate(yrmd = ymd(paste0(year_month, "01"))) %>% 
  group_by(id) %>% 
  mutate(value_last_year = rollapplyr(value, width = 1:n() - findInterval(yrmd %m-% months(13), yrmd), sum),
         value_last_year = ifelse(row_number() == 1, NA, value_last_year - value)) %>% 
  select(-yrmd) %>% 
  ungroup()

Output

      id year_month value value_last_year
   <dbl>      <dbl> <dbl>           <dbl>
 1     1       1801    10              NA
 2     1       1809    20              30
 3     1       1812    30              60
 4     1       1901    40              90
 5     1       1908    50             140
 6     2       1901    60              NA
 7     2       1909    70             130
 8     2       1912    80             210
 9     2       2011    90             170
10     2       2012   100             190

Upvotes: 2

Related Questions