Reputation: 173
I have a df
in R that tracks the status whether an individual is single (0), married (1), or divorced (99) overtime.
ID <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 5)
STATUS <- c("0", "0", "0", "1", "1", "1", "99", "99", "1", "0", "1")
df <- data.frame(ID, STATUS)
df
I would like to create a new variable which flags the first time the individual is divorced (STATUS = 99) and any rows after that point. For example under the STATUS
column, ID 1 was single for three periods, then was divorced for three periods column, and later got married again. The "flag" column flags the first 99 that appears and all events after that row for each ID
.
The final product should look like:
ID STATUS FLAG
1 0 0
1 0 0
1 0 0
1 1 0
1 1 0
1 1 0
1 99 1
1 99 1
1 1 1
5 0 0
5 1 0
Upvotes: 3
Views: 857
Reputation: 39858
One possibility using dplyr
:
df %>%
group_by(ID) %>%
mutate(flag = +(row_number() >= min(which(STATUS == 99))))
ID STATUS flag
<dbl> <fct> <dbl>
1 1. 0 0.
2 1. 0 0.
3 1. 0 0.
4 1. 1 0.
5 1. 1 0.
6 1. 1 0.
7 1. 99 1.
8 1. 99 1.
9 1. 1 1.
10 5. 0 0.
11 5. 1 0.
Or a possibility based on the solution from @markus:
df %>%
group_by(ID) %>%
mutate(flag = cummax(STATUS == 99))
Or with base R
:
df$flag <- ave(df$STATUS, df$ID, FUN = function(x) +(1:nrow(df) >= min(which(x == 99))))
Upvotes: 4
Reputation: 26343
We can use cummax
by group
df$FLAG <- with(df, ave(STATUS, ID, FUN = function(x) cummax(x == 99)))
df
# ID STATUS FLAG
#1 1 0 0
#2 1 0 0
#3 1 0 0
#4 1 1 0
#5 1 1 0
#6 1 1 0
#7 1 99 1
#8 1 99 1
#9 1 1 1
#10 5 0 0
#11 5 1 0
Upvotes: 4