Reputation: 21
My partner and I have a data set that contains missing date values. The goal is to insert rows that would contain the missing dates and sites along with a 0 in the total column for these dates.
The data looks like this...
site type date total
A cat 2009-01-02 1
B cat 2009-01-02 1
A cat 2009-01-04 3
C cat 2009-01-04 1
B cat 2009-01-05 2
A cat 2009-01-06 1
B cat 2009-01-06 1
.
.
.
(And so on to the date of 2020-05-19 ...)
(Type will be the same for all data)
And we would like it to look like this...
site type date total
A cat 2009-01-01 0
B cat 2009-01-01 0
C cat 2009-01-01 0
A cat 2009-01-02 1
B cat 2009-01-02 1
C cat 2009-01-02 0
A cat 2009-01-03 0
B cat 2009-01-03 0
C cat 2009-01-03 0
A cat 2009-01-04 3
B cat 2009-01-04 0
C cat 2009-01-04 1
A cat 2009-01-05 0
B cat 2009-01-05 2
C cat 2009-01-05 0
A cat 2009-01-06 1
B cat 2009-01-06 1
C cat 2009-01-06 0
.
.
.
(And so on to the date of 2020-05-19 ...)
We have tried a few different variations of code that were kindly suggested to us, but due to still being new to R we believe we are applying it wrong to the data. An example of two of the types of codes we have tried so far are...
df %>%
mutate(date=as.Date(date)) %>%
complete(site, type, date = 2009-01-01:2020-05-19, fill = list(total = 0)) %>%
arrange(site,date)
and
df%>%
mutate(date1=as.Date(date)) %>%
group_by(site)%>%
complete(date1 = seq(min(date1), max(date1), by = "1 day")) %>%
fill(total) %>%
select(-date1)
We are still novices, so any suggestions would be greatly appreciated! Thank you.
Upvotes: 1
Views: 1129
Reputation: 3073
Your first example was almost right, you just needed to specify the sequence of dates a bit differently.
Get set up:
library(tidyverse)
set.seed(pi)
start_date = as.Date("2009-01-01")
end_date = as.Date("2020-05-19")
Create a bit of demo data for testing:
df = expand_grid(site = LETTERS[1:3],
type = "cat",
date = seq.Date(start_date, end_date, by = "days")) %>%
mutate(total = sample(x = 0:3, size = n(), replace = T,
prob = c(0.6, 0.1, 0.1, 0.1))) %>%
# Randomly remove a bunch of rows.
slice(-sample(1:n(), size = 3000))
Your first example was almost right... you just needed to specify the sequence of dates using seq.Date
instead of the :
operator. I didn't include your mutate
line since this sample data date was already in Date
format.
completed_df = df %>%
complete(site, type, date = seq.Date(start_date, end_date, by = "days"),
fill = list(total = 0)) %>%
arrange(site, date)
Upvotes: 1
Reputation: 1271
Here's an option using packages from the tidyverse:
library(dplyr)
library(tidyr)
# find all date and site combinations
all <- df %>%
mutate(date = as.Date(date)) %>%
expand(site, date = full_seq(date, 1))
# merge back all date/site combinations to original data
df %>% right_join(all, by = c("date", "site")) %>%
mutate(total = ifelse(is.na(total),
0, total),
type = "cat") %>%
arrange(date, site)
Upvotes: 0