R Studio - get the summation for the rolling month

Question

I have data frame like below, and I want to get the summation(value) for each 4 rolling month.

Edit: In the output I have "2018-12". But it's not shown in the input. It's a typo, my actual data contain "2018-12".

I prefer to use dplyr:

group <- c("red","green","red","red","red","green","green","green","red","green","green","green")
Month <- c("2019-01","2019-02","2019-03","2019-03","2019-05","2019-07","2019-07","2019-08","2019-09","2019-10","2019-10","2019-10")
VALUE <- c(10,20,30,40,50,60,70,80,90,100,110,120)
d_f <- data.frame(group,Month,VALUE)

d_f %>%
  group_by(group) %>%
  summarise(value = sum(value))

Can anyone please help me with how to handle the 4 rolling month? Thanks a lot for your valuable time.

Ben · Accepted Answer

Using lubridate you can use floor_date and group your dates by 4 month intervals.

library(tidyverse)
library(lubridate)

d_f %>%
  mutate(date = as.Date(paste0(Month, '-01'), format = "%Y-%m-%d")) %>%
  arrange(date) %>%
  group_by(group, startdategroup = floor_date(date, "4 months")) %>%
  summarise(value = sum(VALUE)) %>%
  mutate(enddategroup = startdategroup %m+% months(4) - 1)

Output

# A tibble: 6 x 4
# Groups:   group [2]
  group startdategroup value enddategroup
                   
1 green 2019-01-01        20 2019-04-30  
2 green 2019-05-01       210 2019-08-31  
3 green 2019-09-01       330 2019-12-31  
4 red   2019-01-01        80 2019-04-30  
5 red   2019-05-01        50 2019-08-31  
6 red   2019-09-01        90 2019-12-31

Edit: To allow for an "overlap month" (months on the edge of two sequential date intervals), I might take a different approach.

First, I might create a sequence of start and end dates for the intervals (based on minimum and maximum dates in your data frame). The sequence would have date intervals every 4 months.

Then, I would do a fuzzy_left_join (using >= and <= logic) and merge this new data frame with yours. Then a row of data for a single month could be counted twice (once for each of two different intervals).

library(fuzzyjoin)

d_f$date = as.Date(paste0(Month, '-01'), format = "%Y-%m-%d")

d_f2 <- data.frame(date_start = seq.Date(min(d_f$date), max(d_f$date), "4 months"))
d_f2$date_end = date_start %m+% months(4)

d_f %>%
  fuzzy_left_join(d_f2, 
                  by = c("date" = "date_start", "date" = "date_end"), 
                  match_fun = list(`>=`, `<=`)) %>%
  group_by(group, date_start, date_end) %>%
  summarise(value = sum(VALUE))

Output

# A tibble: 6 x 4
# Groups:   group, date_start [6]
  group date_start date_end   value
             
1 green 2019-01-01 2019-05-01    20
2 green 2019-05-01 2019-09-01   210
3 green 2019-09-01 2020-01-01   330
4 red   2019-01-01 2019-05-01   130
5 red   2019-05-01 2019-09-01   140
6 red   2019-09-01 2020-01-01    90

R Studio - get the summation for the rolling month

Answers (2)

Related Questions