krasnapolsky
krasnapolsky

Reputation: 357

Stata, make a variable based on the relative position to other observations

I am performing an event study, see reproducible example below. I only include one unit but this is enough for the question I'm asking.

input unit year treatment
1 2000 0
1 2001 0
1 2002 1
1 2003 0
1 2004 0
1 2005 1
1 2006 0
1 2007 0
end

I generate dif_year which should take the difference of years to the treatment:

sort unit year
bysort unit: gen year_nb = _n
bysort unit: gen year_target = year_nb if treatment == 1
by unit: egen target_distance = min(year_target)
drop year_target
gen dif_year = year_nb - target_distance
drop year_nb target_distance

It works well with one treatment by unit, but here I have two. Using the code snippet from above, I get the following result:

unit year treatment dif_year
1 2000 0 -2
1 2001 0 -1
1 2002 1 0
1 2003 0 1
1 2004 0 2
1 2005 1 3
1 2006 0 4
1 2007 0 5

You can see that it is anchored to the first treatment (2002) but ignores the second one (2005). How can I adapt dif_year to make it work with multiple treatments (here, in 2005) ? The values for 2003 and before are correct, but I would expect to get the value -1 for 2004, 0 for 2005, -1 for 2006 and -2 for 2007.

Upvotes: 0

Views: 526

Answers (3)

Nick Cox
Nick Cox

Reputation: 37208

This solution uses no loops. Evidently the problem hinges on looking backwards as well as forwards; hence reversing time temporarily is a device that can be used.

clear 
input unit year treatment
1 2000 0
1 2001 0
1 2002 1
1 2003 0
1 2004 0
1 2005 1
1 2006 0
1 2007 0
end

bysort unit (year) : gen wanted1 = 0 if treatment 
by unit: replace wanted1 = wanted1[_n-1] + 1 if missing(wanted1)
gen negyear = -year 
bysort unit (negyear) : gen wanted2 = 0 if treatment 
by unit: replace wanted2 = wanted2[_n-1] + 1 if missing(wanted2)

gen wanted = cond(abs(wanted2) < abs(wanted1), - wanted2, wanted1)

sort unit year 

list , sep(0) 

     +---------------------------------------------------------------+
     | unit   year   treatm~t   wanted1   negyear   wanted2   wanted |
     |---------------------------------------------------------------|
  1. |    1   2000          0         .     -2000         2       -2 |
  2. |    1   2001          0         .     -2001         1       -1 |
  3. |    1   2002          1         0     -2002         0        0 |
  4. |    1   2003          0         1     -2003         2        1 |
  5. |    1   2004          0         2     -2004         1       -1 |
  6. |    1   2005          1         0     -2005         0        0 |
  7. |    1   2006          0         1     -2006         .        1 |
  8. |    1   2007          0         2     -2007         .        2 |
     +---------------------------------------------------------------+

   

Upvotes: 1

krasnapolsky
krasnapolsky

Reputation: 357

I found a quick fix to my own question.

I generate a variable that takes missing values if there is no treatment. I then loop over rows, replacing the row below and above each treatment year by its value, until there isn't any remaining missing values.

Here, three iterations are enough but I set the loop until i = 10 just to show that adding more loops doesn't change the outcome.

sort unit year
bysort unit: gen year_nb = _n
bysort unit: gen year_target = year_nb if treatment == 1

gen closest_treatment = year_target

forvalues i = 1(1)10 {
    bysort unit: replace closest_treatment = closest_treatment[_n-`i'] if(year_target[_n-`i'] != . & closest_treatment[_n] == .)
    bysort unit: replace closest_treatment = closest_treatment[_n+`i'] if(year_target[_n+`i'] != . & closest_treatment[_n] == .)
}
replace year_target = closest_treatment if year_target == .
drop closest_treatment

gen dif_year = year_nb - year_target
drop year_nb year_target

Edit: in my example, the number of rows between the two treatments is even. But this solution also works for odd values, as the last row to be iterated over would be exactly in between two treatments. It doesn't matter whether we assign the distance to the previous or next treatment, unless you are interested in the sign of the number, which I assume you want to take into consideration while doing event studies (e.g. if the distance to previous treatment would be +3 years, the distance to the next treatment would be -3). This code snippet assigns value to the previous treatment (positive sign). If you want the opposite, just swap the two lines inside the loop.

Upvotes: 0

TheIceBear
TheIceBear

Reputation: 3255

Here is a solution where the largest number of years does not need to be hardcoded.

clear
input unit year treatment
1 2000 0
1 2001 0
1 2002 1
1 2003 0
1 2004 0
1 2005 1
1 2006 0
1 2007 0
1 2008 0
1 2009 0
1 2010 1
end

sort unit year

*Set all treatment years to 0
gen diff_year = 0 if treatment == 1

*Initilize locals used in the loop
local stop "false"
local diff_distance = 0

while "`stop'" == "false" {
    
    **Replace diff to one more than diff on row above if unit is the same, 
    * no diff for this row, and diff on row above is the diff distance 
    * for this iteration of the loop.
    replace diff_year = diff_year[_n-1] + 1 if unit == unit[_n-1] & missing(diff_year) & diff_year[_n-1] == `diff_distance'
    
    **Replace diff to one less than diff on row below if unit is the same, 
    * no diff for this row, and diff on row above is the diff distance 
    * for this iteration of the loop.
    replace diff_year = diff_year[_n+1] - 1 if unit == unit[_n+1] & missing(diff_year) & diff_year[_n+1] == `diff_distance' * -1
    
    *Test if there are still missing values, and if so set stop local to true
    count if missing(diff_year)
    if `r(N)' == 0 local stop "true"
    
    *Increment the diff distance by one for next loop
    local diff_distance = `diff_distance' + 1
    
}

Upvotes: 1

Related Questions