Standardizing dates in long format dataframe

Question

I have a data frame containing daily COVID cases and deaths for each state of Brazil, like this:

state   date                cases            deaths
 RO 2020-03-20               1                0
 RO 2020-03-21               1                0
 RO 2020-03-22               3                0
 RO 2020-03-23               3                0
 RO 2020-03-24               3                0
 RO 2020-03-25               5                0

My problem is that the states start on different dates, even though all of them end on 2020-05-24. For instance, RO starts on 2020-03-20 but AC starts on 2020-03-19. Is there any technique I can use to standardize them so that every state starts on 2020-02-26?

akrun · Accepted Answer

Assuming that 'date' is Date class, one ption would be complete

library(dplyr)
library(tidyr)
df1 %>%
   group_by(state, region) %>%
   complete(date = seq(as.Date('2020-02-26'), last(date), by = '1 day')) %>%
   ungroup

By default, the missing dates added with complete would have other columns i.e. 'cases', 'deaths' for those rows as NA unless we change with fill argument

Standardizing dates in long format dataframe

Answers (1)

Related Questions