Santana
Santana

Reputation: 21

How to input missing date rows in R with a value of 0 (data out of order)

My partner and I have a data set that contains missing date values. The goal is to insert rows that would contain the missing dates and sites along with a 0 in the total column for these dates.

The data looks like this...

site         type        date            total
A            cat        2009-01-02         1
B            cat        2009-01-02         1           
A            cat        2009-01-04         3
C            cat        2009-01-04         1
B            cat        2009-01-05         2
A            cat        2009-01-06         1
B            cat        2009-01-06         1
.
.
.

(And so on to the date of 2020-05-19 ...)

(Type will be the same for all data)

And we would like it to look like this...

site         type        date            total
A            cat        2009-01-01         0
B            cat        2009-01-01         0 
C            cat        2009-01-01         0 
A            cat        2009-01-02         1
B            cat        2009-01-02         1 
C            cat        2009-01-02         0
A            cat        2009-01-03         0
B            cat        2009-01-03         0 
C            cat        2009-01-03         0           
A            cat        2009-01-04         3
B            cat        2009-01-04         0
C            cat        2009-01-04         1
A            cat        2009-01-05         0
B            cat        2009-01-05         2
C            cat        2009-01-05         0
A            cat        2009-01-06         1
B            cat        2009-01-06         1
C            cat        2009-01-06         0
.
.
.
(And so on to the date of 2020-05-19 ...)

We have tried a few different variations of code that were kindly suggested to us, but due to still being new to R we believe we are applying it wrong to the data. An example of two of the types of codes we have tried so far are...

df %>%

mutate(date=as.Date(date)) %>%

complete(site, type, date = 2009-01-01:2020-05-19, fill = list(total = 0)) %>%

arrange(site,date)

and

df%>%

mutate(date1=as.Date(date)) %>%

group_by(site)%>%

complete(date1 = seq(min(date1), max(date1), by = "1 day")) %>%

fill(total) %>%

select(-date1)

We are still novices, so any suggestions would be greatly appreciated! Thank you.

Upvotes: 1

Views: 1129

Answers (2)

D. Woods
D. Woods

Reputation: 3073

Your first example was almost right, you just needed to specify the sequence of dates a bit differently.

Get set up:

library(tidyverse)
set.seed(pi)

start_date = as.Date("2009-01-01")
end_date = as.Date("2020-05-19")

Create a bit of demo data for testing:

df = expand_grid(site = LETTERS[1:3], 
                 type = "cat", 
                 date = seq.Date(start_date, end_date, by = "days")) %>% 
  mutate(total = sample(x = 0:3, size = n(), replace = T, 
                        prob = c(0.6, 0.1, 0.1, 0.1))) %>% 
  # Randomly remove a bunch of rows.
  slice(-sample(1:n(), size = 3000))

Your first example was almost right... you just needed to specify the sequence of dates using seq.Date instead of the : operator. I didn't include your mutate line since this sample data date was already in Date format.

completed_df = df %>%
  complete(site, type, date = seq.Date(start_date, end_date, by = "days"), 
           fill = list(total = 0)) %>%
  arrange(site, date)

Upvotes: 1

Jay Achar
Jay Achar

Reputation: 1271

Here's an option using packages from the tidyverse:

library(dplyr)
library(tidyr)

# find all date and site combinations
all <- df %>% 
  mutate(date = as.Date(date))  %>% 
  expand(site, date = full_seq(date, 1)) 

# merge back all date/site combinations to original data
df %>% right_join(all, by = c("date", "site")) %>% 
  mutate(total = ifelse(is.na(total), 
                        0, total),
         type = "cat") %>% 
  arrange(date, site)

Upvotes: 0

Related Questions