Reputation: 158
I have some panel data with id, year and a variable indicating whether the individuals are treated at that point in time:
id year treated
1 2000 0
1 2001 0
1 2002 1
1 2003 1
1 2004 1
I need to create a dummy to indicate the year in which the treatment first happened. The desired output is something like:
id year treated treatment_year
1 2000 0 0
1 2001 0 0
1 2002 1 1
1 2003 1 0
1 2004 1 0
It seems fairly simple to me but I've been stuck for a while and I cannot get any ordering function to do this. Thanks a lot for any help
Upvotes: 0
Views: 313
Reputation: 887951
We could create a logical index with row_number
and which.max
and coerce it to binary
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(treatment_year = +(row_number() == which.max(treated)))
# A tibble: 5 x 4
# Groups: id [1]
# id year treated treatment_year
# <int> <int> <int> <int>
#1 1 2000 0 0
#2 1 2001 0 0
#3 1 2002 1 1
#4 1 2003 1 0
#5 1 2004 1 0
Or create a logical expression with duplicated
df1 %>%
group_by(id) %>%
mutate(treatment_year = +(!duplicated(treated) & as.logical(treated)))
df1 <- structure(list(id = c(1L, 1L, 1L, 1L, 1L), year = 2000:2004,
treated = c(0L, 0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 1
Reputation: 389325
You can use match
to get index of first 1 in each id
and except that replace everything with 0.
This can be done using dplyr
:
library(dplyr)
df %>%
group_by(id) %>%
mutate(treatment_year = replace(treated, -match(1L, treated), 0L))
#Can also use :
#mutate(treatment_year = +(row_number() == match(1L, treated)))
# id year treated treatment_year
# <int> <int> <int> <int>
#1 1 2000 0 0
#2 1 2001 0 0
#3 1 2002 1 1
#4 1 2003 1 0
#5 1 2004 1 0
base R :
df$treatment_year <- with(df, ave(treated, id, FUN = function(x)
replace(x, -match(1L, x), 0L)))
and data.table
:
library(data.table)
setDT(df)[, treatment_year := replace(treated, -match(1L, treated), 0L), id]
Explanation of how it works.
match
returns the first index of match. Consider this example
x <- c(0, 0, 1, 1, 1)
match(1, x)
#[1] 3
At 3rd position we find the first 1. By adding -
to it we ignore that index and replace
all other value with 0.
replace(x, -match(1, x), 0)
#[1] 0 0 1 0 0
If x
would always have 1/0 values and x
will always have atleast one 1, we can also use which.max
instead of match
.
which.max(x)
#[1] 3
Upvotes: 1