Find first date of treatment in R

Question

I have some panel data with id, year and a variable indicating whether the individuals are treated at that point in time:

id  year   treated  
1   2000      0            
1   2001      0            
1   2002      1            
1   2003      1            
1   2004      1

I need to create a dummy to indicate the year in which the treatment first happened. The desired output is something like:

id  year   treated   treatment_year
1   2000      0            0
1   2001      0            0
1   2002      1            1
1   2003      1            0
1   2004      1            0

It seems fairly simple to me but I've been stuck for a while and I cannot get any ordering function to do this. Thanks a lot for any help

Ronak Shah · Accepted Answer

You can use match to get index of first 1 in each id and except that replace everything with 0.

This can be done using dplyr :

library(dplyr)
df %>%
  group_by(id) %>%
  mutate(treatment_year = replace(treated, -match(1L, treated), 0L))
  #Can also use : 
  #mutate(treatment_year = +(row_number() == match(1L, treated)))

#     id  year treated treatment_year
#                
#1     1  2000       0              0
#2     1  2001       0              0
#3     1  2002       1              1
#4     1  2003       1              0
#5     1  2004       1              0

base R :

df$treatment_year <- with(df, ave(treated, id, FUN = function(x) 
                          replace(x, -match(1L, x), 0L)))

and data.table :

library(data.table)
setDT(df)[, treatment_year := replace(treated, -match(1L, treated), 0L), id]

Explanation of how it works.

match returns the first index of match. Consider this example

x <- c(0, 0, 1, 1, 1)
match(1, x)
#[1] 3

At 3rd position we find the first 1. By adding - to it we ignore that index and replace all other value with 0.

replace(x, -match(1, x), 0)
#[1] 0 0 1 0 0

If x would always have 1/0 values and x will always have atleast one 1, we can also use which.max instead of match.

which.max(x)
#[1] 3

Find first date of treatment in R

Answers (2)

data

Related Questions