Dylan Wood
Dylan Wood

Reputation: 29

Conditionally impute values in a group based on existing values in R

After grouping my data, I would like to impute values of a column in both directions based on one or more existing values. I have the following data:

id   inw    test    found
001    0       1     <NA>
001    1       2     <NA>
001    1       3       No
001    1       4       No
002    1       1       No
002    0       2     <NA>
002    1       3      Yes
002    1       4       No
003    1       1      Yes
003    1       2      Yes
003    1       3       No
003    0       4     <NA>

The data is grouped by id and the found column is in question here. Essentially, if a value of 'No' appears within a group, all previous values should be changed to 'No'. If 'Yes' appears, all subsequent values should change to 'Yes'.

Therefore, the final result should look like this:

id   inw    test    found
001    0       1       No
001    1       2       No
001    1       3       No
001    1       4       No
002    1       1       No
002    0       2     <NA>
002    1       3      Yes
002    1       4      Yes
003    1       1      Yes
003    1       2      Yes
003    1       3      Yes
003    0       4      Yes

Critically, NA's can remain if there is not a 'No' value anywhere after it it, or no 'Yes' values anywhere before it within the group (highlighted above in group id == 002).

Any help would be greatly appreciated!

Thanks, Dylan

Upvotes: 0

Views: 126

Answers (2)

AnilGoyal
AnilGoyal

Reputation: 26218

My proposed solution without the need of creating any user defined function

library(tidyverse)

df %>% group_by(id) %>%
  mutate(dummy_id = row_number(), #Not sure that there are gaps between test column, thus created one
         coln = ifelse(is.na(first(which(found == 'No'))), 0, first(which(found == 'No'))),
         coly = ifelse(is.na(first(which(found == 'Yes'))), n()+1, first(which(found == 'Yes'))),
         found2 = ifelse(dummy_id >= coly, 'Yes', 
                         ifelse(dummy_id <= coln, 'No', found))) %>%
  select(-dummy_id, -coln, -coly) %>%
  ungroup()

# A tibble: 12 x 5
      id   inw  test found found2
   <int> <int> <int> <chr> <chr> 
 1     1     0     1 NA    No    
 2     1     1     2 NA    No    
 3     1     1     3 No    No    
 4     1     1     4 No    No    
 5     2     1     1 No    No    
 6     2     0     2 NA    NA    
 7     2     1     3 Yes   Yes   
 8     2     1     4 No    Yes   
 9     3     1     1 Yes   Yes   
10     3     1     2 Yes   Yes   
11     3     1     3 No    Yes   
12     3     0     4 NA    Yes 

Upvotes: 0

ekoam
ekoam

Reputation: 8844

Something like this?

library(dplyr)

fill_na <- function(x) {
  if (length(first_yes <- head(which(x == "Yes"), 1L)) > 0L)
    x[seq.int(first_yes, length(x), 1L)] <- "Yes"
  if (length(last_no <- tail(which(x == "No"), 1L)) > 0L)
    x[seq.int(1L, last_no, 1L)] <- "No"
  x
}

df %>% group_by(id) %>% mutate(found = fill_na(found))

Output

# A tibble: 12 x 4
# Groups:   id [3]
   id      inw  test found
   <chr> <dbl> <dbl> <chr>
 1 001       0     1 No   
 2 001       1     2 No   
 3 001       1     3 No   
 4 001       1     4 No   
 5 002       1     1 No   
 6 002       0     2 NA   
 7 002       1     3 Yes  
 8 002       1     4 Yes  
 9 003       1     1 Yes  
10 003       1     2 Yes  
11 003       1     3 Yes  
12 003       0     4 Yes 

Upvotes: 1

Related Questions