Yu Na
Yu Na

Reputation: 122

Repeat rows based on time values split across multiple columns - R

I am trying to repeat rows based on month and year values.

Currently, my df looks like this:

Country Date    Year   Month
Angola  1/2008  2008    1
Angola  6/2020  2020    6
Benin   1/2013  2013    1
Benin   6/2020  2020    6
Benin   7/2014  2014    7

For each country, I want to repeat the observations such that the df looks like this:

Country Year   Month
Angola  2008    1
Angola  2008    2
Angola  2008    3
Angola  2008    4
Angola  2008    5
Angola  2008    6

etc... all the way until 06/2020 for Angola

There is a really elegant solution to repeating rows based on values (from this post). If I were to repeat the rows only based on the years, the syntax from the solution would be like this:

df<-df %>%
  mutate(Year = readr::parse_number(Year)) %>% 
  group_by(Country)  %>%
  complete(Year =min(Year):max(Year))  

However, I want to repeat the timeframe not just based on the years, but also the months. I haven't found a good way to adapt this syntax to do this. I tried to parse the Date variable as a date and then repeat based on that, but this would assign a date to the variable and repeat the rows far more times than I need.

df<-df %>% 
  mutate(Date = readr::parse_datetime(Date)) %>% 
  group_by(Country)  %>%
  complete(Date =min(Date):max(Date))  

Any ideas about how to do this? Would prefer to adapt the syntax I've been trying, but open to new possibilities as well

Upvotes: 1

Views: 473

Answers (2)

akrun
akrun

Reputation: 887118

We remove the Date column, after grouping by 'Country', use complete with sequence of both 'Year' and 'Month'

library(dplyr)
out <- df1 %>% 
   select(-Date) %>% 
   mutate(Month2 = Month) %>% 
   group_by(Country) %>% 
   complete(Year = min(Year):max(Year), Month = first(Month):12) %>% 
   fill(Month2) %>%
   filter(Year == max(Year) & Month <= last(Month2)| Year != max(Year)) %>%
   select(-Month2)
out
# A tibble: 240 x 3
# Groups:   Country [2]
#   Country  Year Month
#   <chr>   <int> <int>
# 1 Angola   2008     1
# 2 Angola   2008     2
# 3 Angola   2008     3
# 4 Angola   2008     4
# 5 Angola   2008     5
# 6 Angola   2008     6
# 7 Angola   2008     7
# 8 Angola   2008     8
# 9 Angola   2008     9
#10 Angola   2008    10
# … with 231 more rows

-checking the output

-head

out %>%
   filter(Country == 'Angola') %>% 
   head(14)
# A tibble: 14 x 3
# Groups:   Country [1]
   Country  Year Month
   <chr>   <int> <int>
 1 Angola   2008     1
 2 Angola   2008     2
 3 Angola   2008     3
 4 Angola   2008     4
 5 Angola   2008     5
 6 Angola   2008     6
 7 Angola   2008     7
 8 Angola   2008     8
 9 Angola   2008     9
10 Angola   2008    10
11 Angola   2008    11
12 Angola   2008    12
13 Angola   2009     1
14 Angola   2009     2

-tail

out %>%
   filter(Country == 'Angola') %>% 
   tail(10)
# A tibble: 10 x 3
# Groups:   Country [1]
   Country  Year Month
   <chr>   <int> <int>
 1 Angola   2019     9
 2 Angola   2019    10
 3 Angola   2019    11
 4 Angola   2019    12
 5 Angola   2020     1
 6 Angola   2020     2
 7 Angola   2020     3
 8 Angola   2020     4
 9 Angola   2020     5
10 Angola   2020     6

data

df1 <- structure(list(Country = c("Angola", "Angola", "Benin", "Benin", 
"Benin"), Date = c("1/2008", "6/2020", "1/2013", "6/2020", "7/2014"
), Year = c(2008L, 2020L, 2013L, 2020L, 2014L), Month = c(1L, 
6L, 1L, 6L, 7L)), class = "data.frame", row.names = c(NA, -5L
))

Upvotes: 3

Jakub.Novotny
Jakub.Novotny

Reputation: 3047

library(tidyverse)

df <- tibble(
  Country = c("Angola", "Angola", "Benin", "Benin", "Benin"),
  Date = c("1/2008", "6/2020", "1/2013", "6/2020", "7/2014"),
  Year = c(2008, 2020, 2013, 2020, 2014),
  Month = c(1,6,1,6,7))


df %>%
  group_by(Country) %>%
  mutate(Date = lubridate::dmy(paste("1", Date))) %>%
  select(-Month, - Year) %>%
  complete(Date = seq(min(Date), max(Date), by = "months"))

Upvotes: 0

Related Questions