Reputation: 1
I am trying to impute missing dates for NA values in a semi-regular date series (observations are roughly 12 months apart). I have tried nafill
, but what I need is that for each successive NA an extra 12 months are added to the previous observation. I also need to to work backwards, i.e. using something like nocb
, but 12 months previously. I also need to to work within the id
groups.
I have data of the form:
dt <- data.table(id=sort(rep(1:3, 11)),
wave=rep(1:11,3),
date=seq.Date(as.Date("2009-05-01"), as.Date("2019-05-01"), by="year") %m+% days(sample(-31:31, 33, replace = T)),
random_na=sample(0:1, 33, replace=T))
dt[, date:=fifelse(random_na==1, NA_Date_, date)]
dt[, random_na:=NULL]
which gives an output like
> dt
id wave date
1: 1 1 2009-04-03
2: 1 2 2010-04-25
3: 1 3 <NA>
4: 1 4 <NA>
5: 1 5 <NA>
6: 1 6 2014-04-10
7: 1 7 2015-05-14
8: 1 8 2016-05-10
9: 1 9 <NA>
10: 1 10 2018-04-08
11: 1 11 2019-05-29
12: 2 1 2009-04-29
13: 2 2 <NA>
14: 2 3 2011-04-26
15: 2 4 2012-03-31
16: 2 5 2013-05-30
17: 2 6 2014-03-31
18: 2 7 2015-05-06
19: 2 8 2016-04-13
20: 2 9 <NA>
21: 2 10 2018-05-05
22: 2 11 2019-05-28
23: 3 1 <NA>
24: 3 2 2010-04-27
25: 3 3 <NA>
26: 3 4 <NA>
27: 3 5 2013-05-15
28: 3 6 2014-05-15
29: 3 7 <NA>
30: 3 8 <NA>
31: 3 9 2017-05-24
32: 3 10 <NA>
33: 3 11 2019-05-06
id wave date
I am trying to fill the missing dates by adding 12 months to each successive NA such that the final data looks like this:
> dt
id wave date
1: 1 1 2009-04-03
2: 1 2 2010-04-25
3: 1 3 2011-04-25
4: 1 4 2012-04-25
5: 1 5 2013-04-25
6: 1 6 2014-04-10
7: 1 7 2015-05-14
8: 1 8 2016-05-10
9: 1 9 2017-05-10
10: 1 10 2018-04-08
11: 1 11 2019-05-29
12: 2 1 2009-04-29
13: 2 2 2010-04-29
14: 2 3 2011-04-26
15: 2 4 2012-03-31
16: 2 5 2013-05-30
17: 2 6 2014-03-31
18: 2 7 2015-05-06
19: 2 8 2016-04-13
20: 2 9 2017-04-13
21: 2 10 2018-05-05
22: 2 11 2019-05-28
23: 3 1 2009-04-27
24: 3 2 2010-04-27
25: 3 3 2011-04-27
26: 3 4 2012-04-27
27: 3 5 2013-05-15
28: 3 6 2014-05-15
29: 3 7 2015-05-15
30: 3 8 2016-05-15
31: 3 9 2017-05-24
32: 3 10 2018-05-24
33: 3 11 2019-05-06
id wave date
Many thanks in advance!
Upvotes: 0
Views: 44
Reputation: 160447
Try this:
dt[, grp := cumsum(!is.na(date)), by = .(id)
][, date := fifelse(is.na(date), first(date) + 365 * (seq_len(.N) - 1), date),
by = .(id, grp)
][, grp := NULL][]
# id wave date
# <int> <int> <IDat>
# 1: 1 1 2009-04-03
# 2: 1 2 2010-04-25
# 3: 1 3 2011-04-25
# 4: 1 4 2012-04-24
# 5: 1 5 2013-04-24
# 6: 1 6 2014-04-10
# 7: 1 7 2015-05-14
# 8: 1 8 2016-05-10
# 9: 1 9 2017-05-10
# 10: 1 10 2018-04-08
# ---
# 24: 3 2 2010-04-27
# 25: 3 3 2011-04-27
# 26: 3 4 2012-04-26
# 27: 3 5 2013-05-15
# 28: 3 6 2014-05-15
# 29: 3 7 2015-05-15
# 30: 3 8 2016-05-14
# 31: 3 9 2017-05-24
# 32: 3 10 2018-05-24
# 33: 3 11 2019-05-06
Data
dt <- setDT(structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), wave = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), date = structure(c(14337L, 14724L, NA, NA, NA, 16170L, 16569L, 16931L, NA, 17629L, 18045L, 14363L, NA, 15090L, 15430L, 15855L, 16160L, 16561L, 16904L, NA, 17656L, 18044L, NA, 14726L, NA, NA, 15840L, 16205L, NA, NA, 17310L, NA, 18022L), class = c("IDate", "Date"))), row.names = c(NA, -33L), class = c("data.table", "data.frame")))
Upvotes: 0