ks54
ks54

Reputation: 85

How do I add calendar dates to an existing data table so that an entire month is accounted for in R?

I need to "back-fill" dates for a 3 year time period (2016-2018) and My actual data has over 32,000 observations and is a table of dates, countries, regions within a country, and protests and riot events.

For simplicity, let's say the data is limited to the month January with 3 countries. I want to eventually do a lagged data panel, but to do so I need all of the dates for the time period accounted for (I think).

DT <- data.table(Date = as.Date(c("2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-12", "2019-01-08" )),
             Country = c("India","India","India","Pakistan","Pakistan", "Cameroon", "India"),
             Region = c('Kashmir', 'Rajasthan', 'Punjab', 'Islamabad', 'National', 'Nord-Ouest', "Kashmir"),
             Protest = c(4,2,0,1,4,1,0 ),
             Riot = c(0,2,1,1,4,1,1 ))

# Date  Country     Region Protest Riot
# 1: 2019-01-01    India    Kashmir       4    0
# 2: 2019-01-01    India  Rajasthan       2    2
# 3: 2019-01-01    India     Punjab       0    1
# 4: 2019-01-01 Pakistan  Islamabad       1    1
# 5: 2019-01-01 Pakistan   National       4    4
# 6: 2019-01-12 Cameroon Nord-Ouest       1    1
# 7: 2019-01-08    India    Kashmir       0    1

I could make and merge a new data table with dates for the month of January and count out the number of repetitions for each country, but that is not feasible for 32 Countries and their regions. Is there a way to account for countries having different numbers of entry for any given date and then filling in rows so that (in this case) each country and region would have a date represented for everyday of the month? a desired output would be along the lines of:

# Date  Country     Region Protest Riot
# 1: 2019-01-01    India    Kashmir       4    0
# 2: 2019-01-01    India  Rajasthan       2    2
# 3: 2019-01-01    India     Punjab       0    1
# 4: 2019-01-01 Pakistan  Islamabad       1    1
# 5: 2019-01-01 Pakistan   National       4    4
# 6: 2019-01-01 Cameroon Nord-Ouest       NA   NA
# 7: 2019-01-02    India    Kashmir       NA   NA
# 8: 2019-01-02    India  Rajasthan       NA   NA
# 9: 2019-01-02    India     Punjab       NA   Na
# 10: 2019-01-02 Pakistan  Islamabad      NA   NA
# 11: 2019-01-02 Pakistan   National      NA   NA
# 12: 2019-01-02 Cameroon Nord-Ouest      NA   NA

Upvotes: 1

Views: 120

Answers (1)

akrun
akrun

Reputation: 887223

May be we can use complete

library(tidyverse)
DT %>% 
   mutate_if(is.character, ~ factor(., levels = unique(.))) %>% 
   group_by(Country, Region) %>%
   complete(Date = seq(min(.$Date), max(.$Date), by = 'day')) %>% 
   ungroup %>% 
   arrange(Date) %>%
   head(12)
# A tibble: 12 x 5
#   Country  Region     Date       Protest  Riot
#   <fct>    <fct>      <date>       <dbl> <dbl>
# 1 India    Kashmir    2019-01-01       4     0
# 2 India    Rajasthan  2019-01-01       2     2
# 3 India    Punjab     2019-01-01       0     1
# 4 Pakistan Islamabad  2019-01-01       1     1
# 5 Pakistan National   2019-01-01       4     4
# 6 Cameroon Nord-Ouest 2019-01-01      NA    NA
# 7 India    Kashmir    2019-01-02      NA    NA
# 8 India    Rajasthan  2019-01-02      NA    NA
# 9 India    Punjab     2019-01-02      NA    NA
#10 Pakistan Islamabad  2019-01-02      NA    NA
#11 Pakistan National   2019-01-02      NA    NA
#12 Cameroon Nord-Ouest 2019-01-02      NA    NA

Upvotes: 0

Related Questions