Reputation: 122
I am trying to repeat rows based on month and year values.
Currently, my df looks like this:
Country Date Year Month
Angola 1/2008 2008 1
Angola 6/2020 2020 6
Benin 1/2013 2013 1
Benin 6/2020 2020 6
Benin 7/2014 2014 7
For each country, I want to repeat the observations such that the df looks like this:
Country Year Month
Angola 2008 1
Angola 2008 2
Angola 2008 3
Angola 2008 4
Angola 2008 5
Angola 2008 6
etc... all the way until 06/2020 for Angola
There is a really elegant solution to repeating rows based on values (from this post). If I were to repeat the rows only based on the years, the syntax from the solution would be like this:
df<-df %>%
mutate(Year = readr::parse_number(Year)) %>%
group_by(Country) %>%
complete(Year =min(Year):max(Year))
However, I want to repeat the timeframe not just based on the years, but also the months. I haven't found a good way to adapt this syntax to do this. I tried to parse the Date
variable as a date and then repeat based on that, but this would assign a date to the variable and repeat the rows far more times than I need.
df<-df %>%
mutate(Date = readr::parse_datetime(Date)) %>%
group_by(Country) %>%
complete(Date =min(Date):max(Date))
Any ideas about how to do this? Would prefer to adapt the syntax I've been trying, but open to new possibilities as well
Upvotes: 1
Views: 473
Reputation: 887118
We remove the Date
column, after grouping by 'Country', use complete
with sequence of both 'Year' and 'Month'
library(dplyr)
out <- df1 %>%
select(-Date) %>%
mutate(Month2 = Month) %>%
group_by(Country) %>%
complete(Year = min(Year):max(Year), Month = first(Month):12) %>%
fill(Month2) %>%
filter(Year == max(Year) & Month <= last(Month2)| Year != max(Year)) %>%
select(-Month2)
out
# A tibble: 240 x 3
# Groups: Country [2]
# Country Year Month
# <chr> <int> <int>
# 1 Angola 2008 1
# 2 Angola 2008 2
# 3 Angola 2008 3
# 4 Angola 2008 4
# 5 Angola 2008 5
# 6 Angola 2008 6
# 7 Angola 2008 7
# 8 Angola 2008 8
# 9 Angola 2008 9
#10 Angola 2008 10
# … with 231 more rows
-checking the output
-head
out %>%
filter(Country == 'Angola') %>%
head(14)
# A tibble: 14 x 3
# Groups: Country [1]
Country Year Month
<chr> <int> <int>
1 Angola 2008 1
2 Angola 2008 2
3 Angola 2008 3
4 Angola 2008 4
5 Angola 2008 5
6 Angola 2008 6
7 Angola 2008 7
8 Angola 2008 8
9 Angola 2008 9
10 Angola 2008 10
11 Angola 2008 11
12 Angola 2008 12
13 Angola 2009 1
14 Angola 2009 2
-tail
out %>%
filter(Country == 'Angola') %>%
tail(10)
# A tibble: 10 x 3
# Groups: Country [1]
Country Year Month
<chr> <int> <int>
1 Angola 2019 9
2 Angola 2019 10
3 Angola 2019 11
4 Angola 2019 12
5 Angola 2020 1
6 Angola 2020 2
7 Angola 2020 3
8 Angola 2020 4
9 Angola 2020 5
10 Angola 2020 6
df1 <- structure(list(Country = c("Angola", "Angola", "Benin", "Benin",
"Benin"), Date = c("1/2008", "6/2020", "1/2013", "6/2020", "7/2014"
), Year = c(2008L, 2020L, 2013L, 2020L, 2014L), Month = c(1L,
6L, 1L, 6L, 7L)), class = "data.frame", row.names = c(NA, -5L
))
Upvotes: 3
Reputation: 3047
library(tidyverse)
df <- tibble(
Country = c("Angola", "Angola", "Benin", "Benin", "Benin"),
Date = c("1/2008", "6/2020", "1/2013", "6/2020", "7/2014"),
Year = c(2008, 2020, 2013, 2020, 2014),
Month = c(1,6,1,6,7))
df %>%
group_by(Country) %>%
mutate(Date = lubridate::dmy(paste("1", Date))) %>%
select(-Month, - Year) %>%
complete(Date = seq(min(Date), max(Date), by = "months"))
Upvotes: 0