rfortin
rfortin

Reputation: 184

Fill in gaps between two dates in a dataframe in r

I want to fill in the 'NAs' between two values in a group with the status of the first value. The group is the "label" field, and I want to fill in the 'NA's' in the "status" field.

Here is what I have:

     datetime                                      title    label   status           option_title
1  2016-08-06                            Pursuit status:      AIG   Active1                   <NA>
2  2016-08-06 What is the current stage of this Pursuit?      AIG     <NA> 1 - Opportunity Review
3  2016-08-31 What is the current stage of this Pursuit?      AIG     <NA>    2 - Solution Review
4  2016-12-13 What is the current stage of this Pursuit?      AIG     <NA>    4 - Submit Proposal
5  2016-11-14                            Pursuit status:  Allianz   Active1                   <NA>
6  2016-10-27 What is the current stage of this Pursuit?  Allianz     <NA>      Pre-Qualification
7  2017-05-18 What is the current stage of this Pursuit?  Allianz     <NA> 1 - Opportunity Review
8  2017-05-18 What is the current stage of this Pursuit?  Allianz     <NA>      Pre-Qualification
9  2017-08-24                            Pursuit status:  Allianz Inactive1                   <NA>
10 2016-10-27 What is the current stage of this Pursuit?  Allianz     <NA>      Pre-Qualification
11 2016-11-14                            Pursuit status:  Allianz   Active2                   <NA>
12 2016-12-19 What is the current stage of this Pursuit?  Allianz     <NA> 1 - Opportunity Review
13 2017-04-14 What is the current stage of this Pursuit?  Allianz     <NA>    2 - Solution Review

Here is what I want:

         datetime                                      title    label   status           option_title
1  2016-08-06                            Pursuit status:      AIG   Active1                   <NA>
2  2016-08-06 What is the current stage of this Pursuit?      AIG   Active1 1 - Opportunity Review    
3  2016-08-31 What is the current stage of this Pursuit?      AIG   Active1    2 - Solution Review
4  2016-12-13 What is the current stage of this Pursuit?      AIG   Active1    4 - Submit Proposal
5  2016-11-14                            Pursuit status:  Allianz   Active1                   <NA>
5  2016-10-27 What is the current stage of this Pursuit?  Allianz   Active1      Pre-Qualification
7  2017-05-18 What is the current stage of this Pursuit?  Allianz   Active1 1 - Opportunity Review
8  2017-05-18 What is the current stage of this Pursuit?  Allianz   Active1      Pre-Qualification
9  2017-08-24                            Pursuit status:  Allianz Inactive1                   <NA>
10 2016-10-27 What is the current stage of this Pursuit?  Allianz Inactive1      Pre-Qualification
11 2016-11-14                            Pursuit status:  Allianz   Active2                   <NA>
12 2016-12-19 What is the current stage of this Pursuit?  Allianz   Active2 1 - Opportunity Review
13 2017-04-14 What is the current stage of this Pursuit?  Allianz   Active2    2 - Solution Review

Is there a way to do this? I'm thinking the best way would be to get the date of the first status and the date of the second status, and fill in all of the values in-between with the first status.

Upvotes: 0

Views: 794

Answers (1)

leerssej
leerssej

Reputation: 14958

A tidyr approach:

It appears you have already solved the ordering issue, so here is code to replicate a minimal example of your dataset.

df_orig <- 
    read.table(text = "
             label   status 
               AIG   Active1
               AIG     <NA> 
               AIG     <NA> 
               AIG     <NA> 
           Allianz   Active1
           Allianz     <NA> 
           Allianz     <NA>
           Allianz     <NA> 
           Allianz Inactive1
           Allianz     <NA> 
           Allianz   Active2
           Allianz     <NA> 
           Allianz     <NA> 
                  ", header = TRUE, stringsAsFactors = FALSE) %>% 
    mutate(status = sub("<NA>", NA, status)) # to turn the "<NA>" to active NA's
> str(df_orig)
'data.frame': 13 obs. of  2 variables:
 $ label : chr  "AIG" "AIG" "AIG" "AIG" ...
 $ status: chr  "Active1" NA NA NA ...

Filling into the 'empties'

df_filled <-  
    df_orig %>% 
    fill(status)

The result:

> df_filled
     label    status
1      AIG   Active1
2      AIG   Active1
3      AIG   Active1
4      AIG   Active1
5  Allianz   Active1
6  Allianz   Active1
7  Allianz   Active1
8  Allianz   Active1
9  Allianz Inactive1
10 Allianz Inactive1
11 Allianz   Active2
12 Allianz   Active2
13 Allianz   Active2

Upvotes: 1

Related Questions