eobrien
eobrien

Reputation: 1

How to fill NA dates where 12 months is added to each successive NA?

I am trying to impute missing dates for NA values in a semi-regular date series (observations are roughly 12 months apart). I have tried nafill, but what I need is that for each successive NA an extra 12 months are added to the previous observation. I also need to to work backwards, i.e. using something like nocb, but 12 months previously. I also need to to work within the id groups.

I have data of the form:

dt <- data.table(id=sort(rep(1:3, 11)),
           wave=rep(1:11,3),
           date=seq.Date(as.Date("2009-05-01"), as.Date("2019-05-01"), by="year") %m+% days(sample(-31:31, 33, replace = T)),
           random_na=sample(0:1, 33, replace=T))
 
dt[, date:=fifelse(random_na==1, NA_Date_, date)]
dt[, random_na:=NULL]

which gives an output like

> dt
    id wave       date
 1:  1    1 2009-04-03
 2:  1    2 2010-04-25
 3:  1    3       <NA>
 4:  1    4       <NA>
 5:  1    5       <NA>
 6:  1    6 2014-04-10
 7:  1    7 2015-05-14
 8:  1    8 2016-05-10
 9:  1    9       <NA>
10:  1   10 2018-04-08
11:  1   11 2019-05-29
12:  2    1 2009-04-29
13:  2    2       <NA>
14:  2    3 2011-04-26
15:  2    4 2012-03-31
16:  2    5 2013-05-30
17:  2    6 2014-03-31
18:  2    7 2015-05-06
19:  2    8 2016-04-13
20:  2    9       <NA>
21:  2   10 2018-05-05
22:  2   11 2019-05-28
23:  3    1       <NA>
24:  3    2 2010-04-27
25:  3    3       <NA>
26:  3    4       <NA>
27:  3    5 2013-05-15
28:  3    6 2014-05-15
29:  3    7       <NA>
30:  3    8       <NA>
31:  3    9 2017-05-24
32:  3   10       <NA>
33:  3   11 2019-05-06
    id wave       date

I am trying to fill the missing dates by adding 12 months to each successive NA such that the final data looks like this:

> dt
    id wave       date
 1:  1    1 2009-04-03
 2:  1    2 2010-04-25
 3:  1    3 2011-04-25
 4:  1    4 2012-04-25
 5:  1    5 2013-04-25
 6:  1    6 2014-04-10
 7:  1    7 2015-05-14
 8:  1    8 2016-05-10
 9:  1    9 2017-05-10
10:  1   10 2018-04-08
11:  1   11 2019-05-29
12:  2    1 2009-04-29
13:  2    2 2010-04-29
14:  2    3 2011-04-26
15:  2    4 2012-03-31
16:  2    5 2013-05-30
17:  2    6 2014-03-31
18:  2    7 2015-05-06
19:  2    8 2016-04-13
20:  2    9 2017-04-13
21:  2   10 2018-05-05
22:  2   11 2019-05-28
23:  3    1 2009-04-27
24:  3    2 2010-04-27
25:  3    3 2011-04-27
26:  3    4 2012-04-27
27:  3    5 2013-05-15
28:  3    6 2014-05-15
29:  3    7 2015-05-15
30:  3    8 2016-05-15
31:  3    9 2017-05-24
32:  3   10 2018-05-24
33:  3   11 2019-05-06
    id wave       date

Many thanks in advance!

Upvotes: 0

Views: 44

Answers (1)

r2evans
r2evans

Reputation: 160447

Try this:

dt[, grp := cumsum(!is.na(date)), by = .(id)
  ][, date := fifelse(is.na(date), first(date) + 365 * (seq_len(.N) - 1), date), 
    by = .(id, grp)
  ][, grp := NULL][]
#        id  wave       date
#     <int> <int>     <IDat>
#  1:     1     1 2009-04-03
#  2:     1     2 2010-04-25
#  3:     1     3 2011-04-25
#  4:     1     4 2012-04-24
#  5:     1     5 2013-04-24
#  6:     1     6 2014-04-10
#  7:     1     7 2015-05-14
#  8:     1     8 2016-05-10
#  9:     1     9 2017-05-10
# 10:     1    10 2018-04-08
# ---                       
# 24:     3     2 2010-04-27
# 25:     3     3 2011-04-27
# 26:     3     4 2012-04-26
# 27:     3     5 2013-05-15
# 28:     3     6 2014-05-15
# 29:     3     7 2015-05-15
# 30:     3     8 2016-05-14
# 31:     3     9 2017-05-24
# 32:     3    10 2018-05-24
# 33:     3    11 2019-05-06

Data

dt <- setDT(structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), wave = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), date = structure(c(14337L, 14724L, NA, NA, NA, 16170L, 16569L, 16931L, NA, 17629L, 18045L, 14363L, NA, 15090L, 15430L, 15855L, 16160L, 16561L, 16904L, NA, 17656L, 18044L, NA, 14726L, NA, NA, 15840L,  16205L, NA, NA, 17310L, NA, 18022L), class = c("IDate", "Date"))), row.names = c(NA, -33L), class = c("data.table", "data.frame")))

Upvotes: 0

Related Questions