mathguy_666
mathguy_666

Reputation: 29

Standardizing dates in long format dataframe

I have a data frame containing daily COVID cases and deaths for each state of Brazil, like this:

state   date                cases            deaths
 RO 2020-03-20               1                0
 RO 2020-03-21               1                0
 RO 2020-03-22               3                0
 RO 2020-03-23               3                0
 RO 2020-03-24               3                0
 RO 2020-03-25               5                0

My problem is that the states start on different dates, even though all of them end on 2020-05-24. For instance, RO starts on 2020-03-20 but AC starts on 2020-03-19. Is there any technique I can use to standardize them so that every state starts on 2020-02-26?

Upvotes: 1

Views: 75

Answers (1)

akrun
akrun

Reputation: 887541

Assuming that 'date' is Date class, one ption would be complete

library(dplyr)
library(tidyr)
df1 %>%
   group_by(state, region) %>%
   complete(date = seq(as.Date('2020-02-26'), last(date), by = '1 day')) %>%
   ungroup

By default, the missing dates added with complete would have other columns i.e. 'cases', 'deaths' for those rows as NA unless we change with fill argument

Upvotes: 2

Related Questions