Reputation: 545
Question: In data.table
is there any way to fill an incomplete date sequence with zeros? For instance, in the toy example some dates does not show up and I want to have a complete date sequence with y = 0
for these cases. Is there something like a forward fill?
Remark: Note I do not want to use merges where you create first the full date sequence and then merge it back to the initial data.table object (I think this is inefficient and rather slow).
library(data.table)
dt <- data.table(
x = c("2020-03-28", "2020-03-29", "2020-03-31", "2020-04-05"),
y = c(1, 5, 3, 70)
)
## Output:
x y
1: 2020-03-28 1
2: 2020-03-29 5
3: 2020-03-31 3
4: 2020-04-05 70
## Desired Output:
x y
1: 2020-03-28 1
2: 2020-03-29 5
3: 2020-03-30 0
4: 2020-03-31 3
5: 2020-04-01 0
6: 2020-04-02 0
7: 2020-04-03 0
8: 2020-04-04 0
9: 2020-04-05 70
Upvotes: 1
Views: 683
Reputation: 6489
You could also try this:
dt[, x := as.IDate(x)]
dt[.(seq(min(x), max(x), 1)), .(y = fifelse(is.na(y), 0, y)), .EACHI, on = "x"]
# x y
# 1: 2020-03-28 1
# 2: 2020-03-29 5
# 3: 2020-03-30 0
# 4: 2020-03-31 3
# 5: 2020-04-01 0
# 6: 2020-04-02 0
# 7: 2020-04-03 0
# 8: 2020-04-04 0
# 9: 2020-04-05 70
Upvotes: 0
Reputation: 35554
Use CJ
and tidyr::full_seq
to create a join data table.
dt[, x := as.Date(x)] # convert x to the Date type
dt2 <- dt[CJ(x = tidyr::full_seq(x, 1)), on = .(x)] # create the full sequence
dt2[is.na(y), y := 0] # fill NAs with 0s
dt2
# x y
# 1: 2020-03-28 1
# 2: 2020-03-29 5
# 3: 2020-03-30 0
# 4: 2020-03-31 3
# 5: 2020-04-01 0
# 6: 2020-04-02 0
# 7: 2020-04-03 0
# 8: 2020-04-04 0
# 9: 2020-04-05 70
Upvotes: 2
Reputation: 34703
How about this?
# convert to data.table's integer date type
dt[ , x := as.IDate(x)]
# find the range of dates
date_bounds = range(dt$x)
# construct a sequence of all dates
# NB: this will be integers as attributes are stripped
all_dates = date_bounds[1L]:date_bounds[2L]
# construct a table with the missing dates,
# with y filled to 0
missing = data.table(
# as.IDate uses the right origin for integer input
x = as.IDate(setdiff(all_dates, dt$x)),
y = 0
)
dt = rbind(dt, missing)
# x y
# <IDat> <num>
# 1: 2020-03-28 1
# 2: 2020-03-29 5
# 3: 2020-03-31 3
# 4: 2020-04-05 70
# 5: 2020-03-30 0
# 6: 2020-04-01 0
# 7: 2020-04-02 0
# 8: 2020-04-03 0
# 9: 2020-04-04 0
Afterwards you can setorder(dt, x)
if you want the dates to be in order
Upvotes: 4
Reputation: 388982
You could use complete
from tidyr
:
library(dplyr)
library(tidyr)
dt %>%
mutate(x = as.Date(x)) %>%
complete(x = seq(min(x), max(x), by = "day"), fill = list(y = 0))
# x y
# <date> <dbl>
#1 2020-03-28 1
#2 2020-03-29 5
#3 2020-03-30 0
#4 2020-03-31 3
#5 2020-04-01 0
#6 2020-04-02 0
#7 2020-04-03 0
#8 2020-04-04 0
#9 2020-04-05 70
Upvotes: 0