Flag the first occurrence of a number and all rows after

Question

I have a df in R that tracks the status whether an individual is single (0), married (1), or divorced (99) overtime.

ID <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 5)
STATUS <- c("0", "0", "0", "1", "1", "1", "99", "99", "1", "0", "1")
df <- data.frame(ID, STATUS)
df

I would like to create a new variable which flags the first time the individual is divorced (STATUS = 99) and any rows after that point. For example under the STATUS column, ID 1 was single for three periods, then was divorced for three periods column, and later got married again. The "flag" column flags the first 99 that appears and all events after that row for each ID.

The final product should look like:

  ID STATUS    FLAG
   1      0      0
   1      0      0
   1      0      0
   1      1      0
   1      1      0
   1      1      0
   1     99      1
   1     99      1
   1      1      1
   5      0      0
   5      1      0

tmfmnk · Accepted Answer

One possibility using dplyr:

df %>%
 group_by(ID) %>%
 mutate(flag = +(row_number() >= min(which(STATUS == 99))))

      ID STATUS  flag
      
 1    1. 0         0.
 2    1. 0         0.
 3    1. 0         0.
 4    1. 1         0.
 5    1. 1         0.
 6    1. 1         0.
 7    1. 99        1.
 8    1. 99        1.
 9    1. 1         1.
10    5. 0         0.
11    5. 1         0.

Or a possibility based on the solution from @markus:

df %>%
 group_by(ID) %>%
 mutate(flag = cummax(STATUS == 99))

Or with base R:

df$flag <- ave(df$STATUS, df$ID, FUN = function(x) +(1:nrow(df) >= min(which(x == 99))))

Flag the first occurrence of a number and all rows after

Answers (2)

Related Questions