Reputation: 1299
I'm currently building some charts of covid-related data....my script goes out and downloads most recent data and goes from there. I wind up with dataframes that look like
head(NMdata)
Date state positiveIncrease totalTestResultsIncrease
1 2020-05-19 NM 158 4367
2 2020-05-18 NM 81 4669
3 2020-05-17 NM 195 4126
4 2020-05-16 NM 159 4857
5 2020-05-15 NM 139 4590
6 2020-05-14 NM 152 4722
I've been aggregating to weekly data using the tq_transmute
function from tidyquant
.
NMweeklyPos <- NMdata %>% tq_transmute(select = positiveIncrease, mutate_fun = apply.weekly, FUN=sum)
This works, but it aggregates on week of the year, with weeks starting on Sunday.
head(NMweeklyPos)
Date positiveIncrease
<dttm> <int>
1 2020-03-08 00:00:00 0
2 2020-03-15 00:00:00 13
3 2020-03-22 00:00:00 44
4 2020-03-29 00:00:00 180
5 2020-04-05 00:00:00 306
6 2020-04-12 00:00:00 631
So for instance if I ran it today (which happens to be a Wednesday) my last entry is a partial week with Monday, Tuesday, Wednesday.
tail(NMweeklyPos)
Date positiveIncrease
<dttm> <int>
1 2020-04-19 00:00:00 624
2 2020-04-26 00:00:00 862
3 2020-05-03 00:00:00 1072
4 2020-05-10 00:00:00 1046
5 2020-05-17 00:00:00 1079
6 2020-05-19 00:00:00 239
For purposes of my chart this winds up being a small value, and so I have been discarding the partial weeks at the end, but that means I'm throwing out the most recent data.
I would prefer the throw out a partial week from the start of the dataset and have the aggregation automatically use weeks that end on whatever day the script is being run. So if I ran it today (Wednesday) it would aggregate on weeks ending Wednesday so that I had the most current data included...I could drop the partial week from the beginning of the data. But tomorrow it would choose weeks ending Thursday, etc. And I don't want to have to hardcode the week end day and change it each time.
How can I go about achieving that?
Upvotes: 1
Views: 210
Reputation: 48
Using lubridate, the below code will find what day of the week it is and define that day as the floor for each week.
Hope this helps!
library(lubridate)
library(dplyr)
end = as.Date("2020-04-14")
data = data.frame(
date = seq.Date(as.Date("2020-01-01"), end, by = "day"),
val = 1
)
# get the day of the week
weekday = wday(end)
# using the floor_date function we can use todays date to determine what day of the week will be our floor
data%>%
mutate(week = floor_date(date, "week", week_start = weekday))%>%
group_by(week)%>%
summarise(total = sum(val))
Upvotes: 2