Reputation: 29
I have a data frame containing daily COVID cases and deaths for each state of Brazil, like this:
state date cases deaths
RO 2020-03-20 1 0
RO 2020-03-21 1 0
RO 2020-03-22 3 0
RO 2020-03-23 3 0
RO 2020-03-24 3 0
RO 2020-03-25 5 0
My problem is that the states start on different dates, even though all of them end on 2020-05-24. For instance, RO
starts on 2020-03-20 but AC
starts on 2020-03-19. Is there any technique I can use to standardize them so that every state starts on 2020-02-26?
Upvotes: 1
Views: 75
Reputation: 887541
Assuming that 'date' is Date
class, one ption would be complete
library(dplyr)
library(tidyr)
df1 %>%
group_by(state, region) %>%
complete(date = seq(as.Date('2020-02-26'), last(date), by = '1 day')) %>%
ungroup
By default, the missing dates added with complete
would have other columns i.e. 'cases', 'deaths' for those rows as NA
unless we change with fill
argument
Upvotes: 2