Amir Habibdoust
Amir Habibdoust

Reputation: 27

Convert Daily Data into Weekly Data and summarize multiple columns in R

I want to change the following data set :

date          A   B
01/01/2018  391 585
02/01/2018  420 595
03/01/2018  455 642
04/01/2018  469 654
05/01/2018  611 900
06/01/2018  449 640
07/01/2018  335 522
08/01/2018  726 955
09/01/2018  676 938
10/01/2018  508 740
11/01/2018  562 778
12/01/2018  561 761
13/01/2018  426 609
14/01/2018  334 508

My desired output is as follows:

date           A       B
07/01/2018  3130    4538
14/01/2018  3793    5289

In which, the numbers of Columns A and B are summed over seven days of weeks. Indeed, I want to convert daily data to weekly data. I found two solutions on the Stackoverflow website. One solution would be using library(tidyquant), and following code

library(tidyquant)
newfd<-df %>%
  tq_transmute(select     = A,
               mutate_fun = apply.weekly,
               FUN        = sum)

The code produces weekly data for Column A while I need all columns. (I have many columns). I also used the following code. But, I don’t know how I can develop the code for all columns.

library(slider)   
slide_period_dfr(.x = califo, .i=as.Date(califo$date), 
                 .period = "week", 
                 .f = ~data.frame(week_ending = tail(.x$ date,1),
                                  week_freq = sum(.x$A)),
                 .origin = as.Date("2018-01-01"))

Upvotes: 1

Views: 1300

Answers (3)

josep maria porr&#224;
josep maria porr&#224;

Reputation: 1388

You can compute an index id = 0:(nrow(df) - 1), once the series is arranged by date, and use that to define to which period (week) belongs each date week = id %/% 7. Date linked to each week is chosen in date = max(date) to be the last date of the week. Other options are possible.

library(dplyr)
library(lubridate)

df <- tribble(~date, ~A, ~B,
"01/01/2018", 391, 585,
"02/01/2018", 420, 595,
"03/01/2018", 455, 642,
"04/01/2018", 469, 654,
"05/01/2018", 611, 900,
"06/01/2018", 449, 640,
"07/01/2018", 335, 522,
"08/01/2018", 726, 955,
"09/01/2018", 676, 938,
"10/01/2018", 508, 740,
"11/01/2018", 562, 778,
"12/01/2018", 561, 761,
"13/01/2018", 426, 609,
"14/01/2018", 334, 508)

df %>%
  mutate(date = dmy(date)) %>% 
  arrange(date) %>% 
  mutate(id = 0:(nrow(df) - 1), week = id %/% 7) %>%
  group_by(week) %>% 
  summarize(date = max(date), across(A:B, sum))

#> # A tibble: 2 x 4
#>    week date           A     B
#>   <dbl> <date>     <dbl> <dbl>
#> 1     0 2018-01-07  3130  4538
#> 2     1 2018-01-14  3793  5289

Created on 2021-06-05 by the reprex package (v0.3.0)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

You can use ceiling_date to make the dates to weekly dates and sum multiple variables with across in dplyr.

library(dplyr)
library(lubridate)

df %>%
  group_by(date = ceiling_date(dmy(date), 'week', week_start = 1)) %>%
  summarise(across(A:B, sum))

#  date           A     B
#  <date>     <int> <int>
#1 2018-01-08  3130  4538
#2 2018-01-15  3793  5289

data

df <- structure(list(date = c("01/01/2018", "02/01/2018", "03/01/2018", 
"04/01/2018", "05/01/2018", "06/01/2018", "07/01/2018", "08/01/2018", 
"09/01/2018", "10/01/2018", "11/01/2018", "12/01/2018", "13/01/2018", 
"14/01/2018"), A = c(391L, 420L, 455L, 469L, 611L, 449L, 335L, 
726L, 676L, 508L, 562L, 561L, 426L, 334L), B = c(585L, 595L, 
642L, 654L, 900L, 640L, 522L, 955L, 938L, 740L, 778L, 761L, 609L, 
508L)), class = "data.frame", row.names = c(NA, -14L))

Upvotes: 2

Andy Eggers
Andy Eggers

Reputation: 612

You could pivot_longer() so that you have only one column of data to transform, apply the function to that column, and then pivot_wider().

Here is a trivial example with mtcars:

library(tidyverse)
mtcars %>%
  rownames_to_column(var = "car") %>% 
  select(car, mpg, cyl) %>% 
  pivot_longer(cols = c(mpg, cyl), names_to = "var") %>% 
  mutate(value = value^2) %>% 
  pivot_wider(names_from = var, names_prefix = "squared_")

# A tibble: 32 x 3
   car               squared_mpg squared_cyl
   <chr>                   <dbl>       <dbl>
 1 Mazda RX4                441           36
 2 Mazda RX4 Wag            441           36
 3 Datsun 710               520.          16
 4 Hornet 4 Drive           458.          36
 5 Hornet Sportabout        350.          64
 6 Valiant                  328.          36
 7 Duster 360               204.          64
 8 Merc 240D                595.          16
 9 Merc 230                 520.          16
10 Merc 280                 369.          36
# … with 22 more rows

Your aggregation would replace my mutate step.

Whether this is more concise than repeatedly creating new variables depends on how many variables you are dealing with.

Upvotes: 0

Related Questions