Roberto
Roberto

Reputation: 181

R Calculate change in Weekly values Year on Year (with additional complication)

I have a data set of daily value. It spans from Dec-1 2018 to April-1 2020.

The columns are "date" and "value". As shown here:

date <- c("2018-12-01","2000-12-02", "2000-12-03",
     ...
      "2020-03-30","2020-03-31","2020-04-01")

value <- c(1592,1825,1769,1909,2022, .... 2287,2169,2366,2001,2087,2099,2258)

df <- data.frame(date,value)

What I would like to do is the sum the values by week and then calculate week over week change from the current to previous year.

I know that I can sum by week using the following function:

Data_week <-  df%>% group_by(category ,week = cut(date, "week")) %>% mutate(summed= sum(value))

My questions are twofold:

1) How do I sum by week and then manipulate the dataframe so that I can calculate week over week change (e.g. week dec.1 2019/ week dec.1 2018).

2) How can I do that above, but using a "customized" week. Let's say I want to define a week as moving 7 days back from the latest date I have data for. Eg. the latest week I would have would be week starting on March 26th (April 1st -7 days).

Upvotes: 3

Views: 1148

Answers (1)

Ian Campbell
Ian Campbell

Reputation: 24790

We can use lag from dplyr to help and also some convenience functions from lubridate.

library(dplyr)
library(lubridate)
df %>% 
  mutate(year = year(date)) %>%
  group_by(week = week(date),year) %>%
  summarize(summed = sum(value)) %>%
  arrange(year, week) %>%
  ungroup %>%
  mutate(change = summed - lag(summed))
#    week  year summed  change
#   <dbl> <dbl>  <dbl>   <dbl>
# 1    48  2018  3638.     NA 
# 2    49  2018 15316.  11678.
# 3    50  2018 13283.  -2033.
# 4    51  2018 15166.   1883.
# 5    52  2018 12885.  -2281.
# 6    53  2018  1982. -10903.
# 7     1  2019 14177.  12195.
# 8     2  2019 14969.    791.
# 9     3  2019 14554.   -415.
#10     4  2019 12850.  -1704.
#11     5  2019  1907. -10943.

If you would like to define "weeks" in different ways, there is also isoweek and epiweek. See this answer for a great explaination of your options.

Data

set.seed(1)
df <- data.frame(date = seq.Date(from = as.Date("2018-12-01"), to = as.Date("2019-01-29"), "days"), value = runif(60,1500,2500))

Upvotes: 3

Related Questions