Reputation: 85
I need to "back-fill" dates for a 3 year time period (2016-2018) and My actual data has over 32,000 observations and is a table of dates, countries, regions within a country, and protests and riot events.
For simplicity, let's say the data is limited to the month January with 3 countries. I want to eventually do a lagged data panel, but to do so I need all of the dates for the time period accounted for (I think).
DT <- data.table(Date = as.Date(c("2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-12", "2019-01-08" )),
Country = c("India","India","India","Pakistan","Pakistan", "Cameroon", "India"),
Region = c('Kashmir', 'Rajasthan', 'Punjab', 'Islamabad', 'National', 'Nord-Ouest', "Kashmir"),
Protest = c(4,2,0,1,4,1,0 ),
Riot = c(0,2,1,1,4,1,1 ))
# Date Country Region Protest Riot
# 1: 2019-01-01 India Kashmir 4 0
# 2: 2019-01-01 India Rajasthan 2 2
# 3: 2019-01-01 India Punjab 0 1
# 4: 2019-01-01 Pakistan Islamabad 1 1
# 5: 2019-01-01 Pakistan National 4 4
# 6: 2019-01-12 Cameroon Nord-Ouest 1 1
# 7: 2019-01-08 India Kashmir 0 1
I could make and merge a new data table with dates for the month of January and count out the number of repetitions for each country, but that is not feasible for 32 Countries and their regions. Is there a way to account for countries having different numbers of entry for any given date and then filling in rows so that (in this case) each country and region would have a date represented for everyday of the month? a desired output would be along the lines of:
# Date Country Region Protest Riot
# 1: 2019-01-01 India Kashmir 4 0
# 2: 2019-01-01 India Rajasthan 2 2
# 3: 2019-01-01 India Punjab 0 1
# 4: 2019-01-01 Pakistan Islamabad 1 1
# 5: 2019-01-01 Pakistan National 4 4
# 6: 2019-01-01 Cameroon Nord-Ouest NA NA
# 7: 2019-01-02 India Kashmir NA NA
# 8: 2019-01-02 India Rajasthan NA NA
# 9: 2019-01-02 India Punjab NA Na
# 10: 2019-01-02 Pakistan Islamabad NA NA
# 11: 2019-01-02 Pakistan National NA NA
# 12: 2019-01-02 Cameroon Nord-Ouest NA NA
Upvotes: 1
Views: 120
Reputation: 887223
May be we can use complete
library(tidyverse)
DT %>%
mutate_if(is.character, ~ factor(., levels = unique(.))) %>%
group_by(Country, Region) %>%
complete(Date = seq(min(.$Date), max(.$Date), by = 'day')) %>%
ungroup %>%
arrange(Date) %>%
head(12)
# A tibble: 12 x 5
# Country Region Date Protest Riot
# <fct> <fct> <date> <dbl> <dbl>
# 1 India Kashmir 2019-01-01 4 0
# 2 India Rajasthan 2019-01-01 2 2
# 3 India Punjab 2019-01-01 0 1
# 4 Pakistan Islamabad 2019-01-01 1 1
# 5 Pakistan National 2019-01-01 4 4
# 6 Cameroon Nord-Ouest 2019-01-01 NA NA
# 7 India Kashmir 2019-01-02 NA NA
# 8 India Rajasthan 2019-01-02 NA NA
# 9 India Punjab 2019-01-02 NA NA
#10 Pakistan Islamabad 2019-01-02 NA NA
#11 Pakistan National 2019-01-02 NA NA
#12 Cameroon Nord-Ouest 2019-01-02 NA NA
Upvotes: 0