Repeat rows based on time values split across multiple columns - R

Question

I am trying to repeat rows based on month and year values.

Currently, my df looks like this:

Country Date    Year   Month
Angola  1/2008  2008    1
Angola  6/2020  2020    6
Benin   1/2013  2013    1
Benin   6/2020  2020    6
Benin   7/2014  2014    7

For each country, I want to repeat the observations such that the df looks like this:

Country Year   Month
Angola  2008    1
Angola  2008    2
Angola  2008    3
Angola  2008    4
Angola  2008    5
Angola  2008    6

etc... all the way until 06/2020 for Angola

There is a really elegant solution to repeating rows based on values (from this post). If I were to repeat the rows only based on the years, the syntax from the solution would be like this:

df<-df %>%
  mutate(Year = readr::parse_number(Year)) %>% 
  group_by(Country)  %>%
  complete(Year =min(Year):max(Year))

However, I want to repeat the timeframe not just based on the years, but also the months. I haven't found a good way to adapt this syntax to do this. I tried to parse the Date variable as a date and then repeat based on that, but this would assign a date to the variable and repeat the rows far more times than I need.

df<-df %>% 
  mutate(Date = readr::parse_datetime(Date)) %>% 
  group_by(Country)  %>%
  complete(Date =min(Date):max(Date))

Any ideas about how to do this? Would prefer to adapt the syntax I've been trying, but open to new possibilities as well

Jakub.Novotny · Accepted Answer

library(tidyverse)

df <- tibble(
  Country = c("Angola", "Angola", "Benin", "Benin", "Benin"),
  Date = c("1/2008", "6/2020", "1/2013", "6/2020", "7/2014"),
  Year = c(2008, 2020, 2013, 2020, 2014),
  Month = c(1,6,1,6,7))


df %>%
  group_by(Country) %>%
  mutate(Date = lubridate::dmy(paste("1", Date))) %>%
  select(-Month, - Year) %>%
  complete(Date = seq(min(Date), max(Date), by = "months"))

Repeat rows based on time values split across multiple columns - R

Answers (2)

data

Related Questions