Filling NA data with data from another row with matching

Question

Here is example data:

df = data.frame(id = (1:5),
                type= c("a_type","a_type","b_type","b_type", "c_type"),
                start_date= lubridate::dmy(c("01/01/2014", "30/04/2014
", "30/04/2015", "10/05/2015", "30/03/2016")),
                fail_date = lubridate::dmy(c("30/04/2015", rep(NA,4))))
> df
  id   type start_date  fail_date
1  1 a_type 2014-01-01 2015-04-30
2  1 a_type 2014-04-30       
3  1 b_type 2015-04-30       
4  1 b_type 2015-05-10       
5  1 c_type 2016-03-30

I want to fill in fail_date of "a_type" where they are NA. This needs to be the start_date of the next type that is "c_type" or "a_type". So output should be:

> df1
  id   type start_date  fail_date
1  1 a_type 2014-01-01 2015-04-30
2  2 a_type 2014-04-30 2016-03-30
3  3 b_type 2015-04-30       
4  4 b_type 2015-05-10       
5  5 c_type 2016-03-30

I have started with this code, but don't know how to specify the start_date of the next entry that is type "c_type" or "a_type":

df1= df%>%
  mutate(fail_date = case_when(is.na(fail_date) & type =="a_type" ~ ??? ,
                               TRUE ~ fail_date))

I do not want to just filter out the "b_type"

#######Edit##############

Following the answer given by @Allan Cameron I have tried this on my data but have come across an issue.

If the last type is "b-type" the code won't work, as the cumsum(rev(df$type) != 'b_type') starts with a 0. Here is an example with amended data:

df = data.frame(id = (1:6),
                type= c("a_type","a_type","b_type","b_type", "c_type","b_type"),
                start_date= lubridate::dmy(c("01/12/2013","01/01/2014", "30/04/2014
", "30/04/2015", "10/05/2015", "30/03/2016")),
                fail_date =rep(NA,6))

df1= df%>%
  arrange(start_date) %>%
  mutate(next_start = lead(rev(rev(start_date[type != 'b_type'])[
    cumsum(rev(type) != 'b_type')])),
    fail_date = if_else(type == 'a_type' & 
                          is.na(fail_date) ,
                        next_start, 
                        lubridate::dmy(fail_date)))

Error: Problem with `mutate()` column `next_start`.
i `next_start = lead(...)`.
i `next_start` must be size 6 or 1, not 5.

Here are some of the individual elements which have helped me to understand the error, but I dont' know how to overcome this:

df1= df%>%
  mutate(         rev =  rev(type),
         qa = rev(type) != 'b_type',
         qw= cumsum(rev(type)!='b_type')
         )

df1
  id   type start_date fail_date    rev    qa qw
1  1 a_type 2013-12-01        NA b_type FALSE  0
2  2 a_type 2014-01-01        NA c_type  TRUE  1
3  3 b_type 2014-04-30        NA b_type FALSE  1
4  4 b_type 2015-04-30        NA b_type FALSE  1
5  5 c_type 2015-05-10        NA a_type  TRUE  2
6  6 b_type 2016-03-30        NA a_type  TRUE  3

Because the first entry in qw is 0, rev(rev(start_date[type != 'b_type'])[ cumsum(rev(type) != 'b_type')]) does not produce a result.

Filling NA data with data from another row with matching

Answers (1)

Related Questions