user20203146
user20203146

Reputation: 498

How to Disaggregate Monthly Data into Weekly Basis?

I am trying to disaggregate the monthly data and spread them into weekly data in two ways.

The data which I am working with is given below:

structure(list(`Row Labels` = c("X6", "X7", "X8", "X9"), `2022-11-01` = c(100, 
200, 300, 400), `2022-12-01` = c(160, 200, 300, 400), `2023-01-01` = c(500, 
550, 600, 650)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-4L))

And it looks like this:

enter image description here

The expected output 1 is given below, as you can see all the dates are Mondays:

enter image description here

The expected output 2 is given below:

enter image description here

Is this doable, or is it a bit too much to expect from R?

Upvotes: 2

Views: 775

Answers (1)

M--
M--

Reputation: 29238

For Mondays we can create a list of Mondays between the dates in the dataframe, join it with the data in long format, count number of the Mondays for each variable in each month, divide the values by the number of Mondays, and revert back the format to wide;

library(dplyr)
library(tidyr)
library(lubridate)
all_dates <- as.Date(names(df1)[-1])

MON <- seq(min(floor_date(all_dates, "month")), 
           max(ceiling_date(all_dates, "month")),
           by="1 day") %>% 
      .[wday(.,label = TRUE) == "Mon"] %>% 
      data.frame("Mondays" = .) %>% 
      mutate(mmm = format(Mondays, "%Y-%m"))
  
df1 %>% 
  pivot_longer(cols = -`Row Labels`, names_to = "dates") %>% 
  mutate(dates = as.Date(dates),
         mmm = format(dates, "%Y-%m"))  %>%
  right_join(MON, by = "mmm") %>% 
  arrange(mmm) %>% 
  group_by(`Row Labels`, dates) %>% 
  mutate(value = value / n()) %>% 
  ungroup() %>% 
  select(`Row Labels`, Mondays, value) %>% 
  pivot_wider(`Row Labels`, names_from = "Mondays", values_from = "value")
#> # A tibble: 4 x 14
#>   `Row Labels` `2022-11-07` `2022-11-14` `2022-11-21` `2022-11-28` `2022-12-05`
#>   <chr>               <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
#> 1 X6                     25           25           25           25           40
#> 2 X7                     50           50           50           50           50
#> 3 X8                     75           75           75           75           75
#> 4 X9                    100          100          100          100          100
#> # ... with 8 more variables: 2022-12-12 <dbl>, 2022-12-19 <dbl>,
#> #   2022-12-26 <dbl>, 2023-01-02 <dbl>, 2023-01-09 <dbl>, 2023-01-16 <dbl>,
#> #   2023-01-23 <dbl>, 2023-01-30 <dbl>

Same principal goes to doing it weekly:

WKLY <- seq(min(floor_date(all_dates, "month")), 
            max(ceiling_date(all_dates, "month")), 
            by="week") %>% 
          data.frame("Weekly" = .) %>% 
          mutate(mmm = format(Weekly, "%Y-%m"))

df1 %>% 
  pivot_longer(cols = -`Row Labels`, names_to = "dates") %>% 
  mutate(dates = as.Date(dates),
         mmm = format(dates, "%Y-%m"))  %>%
  right_join(WKLY, by = "mmm") %>% 
  arrange(mmm) %>% 
  group_by(`Row Labels`, dates) %>% 
  mutate(value = value / n()) %>% 
  ungroup() %>% 
  select(`Row Labels`, Weekly, value) %>% 
  pivot_wider(`Row Labels`, names_from = "Weekly", values_from = "value")
#> # A tibble: 4 x 15
#>   `Row Labels` `2022-11-01` `2022-11-08` `2022-11-15` `2022-11-22` `2022-11-29`
#>   <chr>               <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
#> 1 X6                     20           20           20           20           20
#> 2 X7                     40           40           40           40           40
#> 3 X8                     60           60           60           60           60
#> 4 X9                     80           80           80           80           80
#> # ... with 9 more variables: 2022-12-06 <dbl>, 2022-12-13 <dbl>,
#> #   2022-12-20 <dbl>, 2022-12-27 <dbl>, 2023-01-03 <dbl>, 2023-01-10 <dbl>,
#> #   2023-01-17 <dbl>, 2023-01-24 <dbl>, 2023-01-31 <dbl>

Data:

df1 <- structure(list(`Row Labels` = c("X6", "X7", "X8", "X9"), 
                      `2022-11-01` = c(100, 200, 300, 400), 
                      `2022-12-01` = c(160, 200, 300, 400), 
                      `2023-01-01` = c(500, 550, 600, 650)), 
                 class = c("tbl_df", "tbl", "data.frame"), 
                 row.names = c(NA, -4L))

Upvotes: 3

Related Questions