Determine missing values based on lagged grouped values

Question

I need to fill missing values based on previous and/or forward values, based on grouping. I would like to accomplish this using dplyr (although data.table solutions will also be welcomed).

Sample data:

testing <- tibble(key = c(10,10,10,10,10,10,20,20,20,20,20,20),
                  year = c(15,15,16,16,17,17,15,15,16,16,17,17),
                  name = c("abc","abc","","","dfg","dfg",
                          "","","nmm","nmm","",""),
                  is_name = c(1,1,0,0,1,1,0,0,0,0,0,0))

     key  year  name is_name
        
1     10    15   abc       1
2     10    15   abc       1
3     10    16             0
4     10    16             0
5     10    17   dfg       1
6     10    17   dfg       1
7     20    15             0
8     20    15             0
9     20    16   nmm       0
10    20    16   nmm       0
11    20    17             0
12    20    17             0

I want to fill missing names (name) in a way that if the previous year of the same key is marked as is_name==1, than fill the missing with it. So an output can be:

     key  year  name is_name name_new
            
1     10    15   abc       1      abc
2     10    15   abc       1      abc
3     10    16             0      abc
4     10    16             0      abc
5     10    17   dfg       1      dfg
6     10    17   dfg       1      dfg
7     20    15             0         
8     20    15             0         
9     20    16   nmm       0      nmm
10    20    16   nmm       0      nmm
11    20    17             0         
12    20    17             0

I tried to use lag and leap, but it didn't move beyond the group (key)properly.

Thanks!

Matt Jewett · Accepted Answer

This might work for you

library(dplyr)
library(zoo)

testing <- testing %>%
           arrange(key, year) %>%
           mutate(name = ifelse(name == "", NA, name),
                  is_name = ifelse(is_name == 0, NA, is_name)) %>%
           group_by(key) %>%
           mutate(newname = ifelse((is.na(name) & na.locf(is_name, na.rm = FALSE) == 1), na.locf(name, na.rm = FALSE), name),
                  is_name = ifelse(is.na(is_name),0,is_name))

Determine missing values based on lagged grouped values

Answers (1)

Related Questions