jerH
jerH

Reputation: 1299

Aggregating daily data to weekly, ending today

I'm currently building some charts of covid-related data....my script goes out and downloads most recent data and goes from there. I wind up with dataframes that look like

head(NMdata)

    Date state positiveIncrease totalTestResultsIncrease
1 2020-05-19    NM              158                     4367
2 2020-05-18    NM               81                     4669
3 2020-05-17    NM              195                     4126
4 2020-05-16    NM              159                     4857
5 2020-05-15    NM              139                     4590
6 2020-05-14    NM              152                     4722

I've been aggregating to weekly data using the tq_transmute function from tidyquant.

NMweeklyPos <- NMdata %>% tq_transmute(select = positiveIncrease, mutate_fun = apply.weekly, FUN=sum)

This works, but it aggregates on week of the year, with weeks starting on Sunday.

head(NMweeklyPos)

Date                positiveIncrease
  <dttm>                         <int>
1 2020-03-08 00:00:00                0
2 2020-03-15 00:00:00               13
3 2020-03-22 00:00:00               44
4 2020-03-29 00:00:00              180
5 2020-04-05 00:00:00              306
6 2020-04-12 00:00:00              631

So for instance if I ran it today (which happens to be a Wednesday) my last entry is a partial week with Monday, Tuesday, Wednesday.

tail(NMweeklyPos)

 Date                positiveIncrease
  <dttm>                         <int>
1 2020-04-19 00:00:00              624
2 2020-04-26 00:00:00              862
3 2020-05-03 00:00:00             1072
4 2020-05-10 00:00:00             1046
5 2020-05-17 00:00:00             1079
6 2020-05-19 00:00:00              239

For purposes of my chart this winds up being a small value, and so I have been discarding the partial weeks at the end, but that means I'm throwing out the most recent data.

I would prefer the throw out a partial week from the start of the dataset and have the aggregation automatically use weeks that end on whatever day the script is being run. So if I ran it today (Wednesday) it would aggregate on weeks ending Wednesday so that I had the most current data included...I could drop the partial week from the beginning of the data. But tomorrow it would choose weeks ending Thursday, etc. And I don't want to have to hardcode the week end day and change it each time.

How can I go about achieving that?

Upvotes: 1

Views: 210

Answers (1)

cbaylor
cbaylor

Reputation: 48

Using lubridate, the below code will find what day of the week it is and define that day as the floor for each week.

Hope this helps!

library(lubridate)
library(dplyr)

end = as.Date("2020-04-14")
data = data.frame(
  date = seq.Date(as.Date("2020-01-01"), end, by = "day"),
  val = 1
)

# get the day of the week
weekday = wday(end)

# using the floor_date function we can use todays date to determine what day of the week will be our floor
data%>%
  mutate(week = floor_date(date, "week", week_start = weekday))%>%
  group_by(week)%>%
  summarise(total = sum(val))

Upvotes: 2

Related Questions