Reputation: 836
I have a dataset like df, where year_month represents the year in the first 2 digits, and the month in the last 2 digits (1901=01/2019).
id <- c(1,1,1,1,1,2,2,2,2,2)
year_month <- c(1801,1809,1812,1901,1908, 1901,1909,1912,2011,2012 )
value <- c(10,20,30,40,50,60,70,80,90,100)
df <- data.frame(id, year_month, value)
I want to sum the accumulated value of each firm during the last year (e.g. if we are in 1901=01/2019, I want to start counting since 1801 (01/2018) and record the history of values prior to 1901, i.e. count from 1801 until 1812). The dataset will look like one_year_window
value_last_year <- c(NA,10,30,60,90,NA,60,130,80,170)
one_year_window <- data.frame(id, year_month, value, value_last_year)
I have tried several ways (cummsum()) in dplyr without success.
Any clue?
Upvotes: 1
Views: 230
Reputation: 18692
df %>%
group_by(id, year = round(year_month, -2)) %>%
mutate(value_last_year = cumsum(value)) %>%
ungroup(year) %>%
mutate(value_last_year = ifelse(year == max(year), value, NA)) %>%
ungroup() %>%
select(-year)
Though it may be to your advantage to compute the cumulative sum across all years and then filter by id
and year_month
if you need their most recent records.
Output
id year_month value value_last_year
<dbl> <dbl> <dbl> <dbl>
1 1 1801 10 NA
2 1 1809 20 NA
3 1 1812 30 NA
4 1 1901 40 40
5 1 1908 50 50
6 2 1901 60 NA
7 2 1909 70 NA
8 2 1912 80 NA
9 2 2011 90 90
10 2 2012 100 100
Update
Based on your updated post. I do not take credit for this answer. You should follow this link and upvote the answer there that this is based on:
library(dplyr)
library(lubridate)
library(zoo)
df %>%
mutate(yrmd = ymd(paste0(year_month, "01"))) %>%
group_by(id) %>%
mutate(value_last_year = rollapplyr(value, width = 1:n() - findInterval(yrmd %m-% months(13), yrmd), sum),
value_last_year = ifelse(row_number() == 1, NA, value_last_year - value)) %>%
select(-yrmd) %>%
ungroup()
Output
id year_month value value_last_year
<dbl> <dbl> <dbl> <dbl>
1 1 1801 10 NA
2 1 1809 20 30
3 1 1812 30 60
4 1 1901 40 90
5 1 1908 50 140
6 2 1901 60 NA
7 2 1909 70 130
8 2 1912 80 210
9 2 2011 90 170
10 2 2012 100 190
Upvotes: 2