pd441
pd441

Reputation: 2773

R: set leading row values to NA depending on condition

Here is my example dataframe:

df <- tibble( id = c(rep('a', 5), rep('b', 6), rep('c', 6)),
  event = c("Visit 1", "Visit 2", 
                "Visit 3", "Visit 4", "Visit 5",  
                "Visit 1", "Visit 2", "Visit 3",
                NA, "Visit 4", "Visit 5",
                "Visit 1", NA, "visit 2", 
                "Visit 3","Visit 4", "Visit 5"),
             expected_output = c("Visit 1", "Visit 2", 
                                 "Visit 3", "Visit 4", "Visit 5",  
                                 "Visit 1", "Visit 2", "Visit 3",
                                 NA,  NA,  NA, "Visit 1", NA, NA, NA, NA, NA))

I would like to create a new column where, grouped by id, whenever an NA occurs in the column event then all subsequnt rows should be set to NA. as can be seen in the expected_output column.

Here is my attempt, creating a new column expected_output_b using the lead function, however it doesn't work. Can anyone else help with this as unfortunately I can't think of another way to approach it.

df <- df %>% group_by(id) %>% mutate( expected_output_b = if_else(is.na(event), all(lead(event)) == NA, event))

Thanks!

Upvotes: 0

Views: 70

Answers (2)

Wimpel
Wimpel

Reputation: 27792

a data.table approach

library(data.table)
setDT(df)[, new := ifelse(df[, cumsum(is.na(event)) > 0, by = .(id)]$V1, NA, event)]
#    id   event expected_output     new
# 1:  a Visit 1         Visit 1 Visit 1
# 2:  a Visit 2         Visit 2 Visit 2
# 3:  a Visit 3         Visit 3 Visit 3
# 4:  a Visit 4         Visit 4 Visit 4
# 5:  a Visit 5         Visit 5 Visit 5
# 6:  b Visit 1         Visit 1 Visit 1
# 7:  b Visit 2         Visit 2 Visit 2
# 8:  b Visit 3         Visit 3 Visit 3
# 9:  b    <NA>            <NA>    <NA>
#10:  b Visit 4            <NA>    <NA>
#11:  b Visit 5            <NA>    <NA>
#12:  c Visit 1         Visit 1 Visit 1
#13:  c    <NA>            <NA>    <NA>
#14:  c visit 2            <NA>    <NA>
#15:  c Visit 3            <NA>    <NA>
#16:  c Visit 4            <NA>    <NA>
#17:  c Visit 5            <NA>    <NA>

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389315

You can use replace grouped by id -

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(result  = replace(event, row_number() > which(is.na(event))[1], NA)) %>%
  ungroup

#   id    event   result 
#   <chr> <chr>   <chr>  
# 1 a     Visit 1 Visit 1
# 2 a     Visit 2 Visit 2
# 3 a     Visit 3 Visit 3
# 4 a     Visit 4 Visit 4
# 5 a     Visit 5 Visit 5
# 6 b     Visit 1 Visit 1
# 7 b     Visit 2 Visit 2
# 8 b     Visit 3 Visit 3
# 9 b     NA      NA     
#10 b     Visit 4 NA     
#11 b     Visit 5 NA     
#12 c     Visit 1 Visit 1
#13 c     NA      NA     
#14 c     visit 2 NA     
#15 c     Visit 3 NA     
#16 c     Visit 4 NA     
#17 c     Visit 5 NA 

Upvotes: 0

Related Questions