Reputation: 5
I have a problem in conditioning the dataset I have on Stata. Basically I want to condition the presence in the dataset -within a certain group- of an observation for which a certain action is performed (as indicated by a variable) on the past values of another variable. So let's suppose I have the following
obs | id | action1 | action2 | year
1 | 1 | 1 | 0 | 2000
2 | 1 | 0 | 1 | 2001
3 | 1 | 0 | 1 | 2002
4 | 1 | 0 | 1 | 2002
5 | 1 | 0 | 1 | 2003
6 | 2 | 1 | 0 | 2000
7 | 2 | 1 | 0 | 2001
8 | 2 | 0 | 1 | 2002
9 | 2 | 0 | 1 | 2002
10 | 2 | 0 | 1 | 2003
And for each group identified by 'id' I want to keep the observation only if action 1 is performed or if action1 has been performed no earlier than 2 years before action2 has been performed. In this simplified example only observation 4 should be deleted. Please note that the 2 actions are not mutually exclusive and they can be performed more than once within the same year therefore looking at 2 observations in the past does not necessarily means to look at 2 years in the past.
A solution which I am not able to implement by code would be: gen act1year= action1 * year then by(id) store the value of act1year when they're different from 0 somewhere (I am not able to implement this) and then by(id) keep if action1=1 or if action2[_n]=1 and the range year[_n] to year[_n]-2 contains at least one of the values in the previously stored variable.
I know probably my suggestion is not the easiest way to go and still I am not able to implement it, unfortunately I cannot manage to find a code that help me doing this. Hope you can help me. Thanks
Francesco
Upvotes: 0
Views: 313
Reputation: 11102
The following assumes certain things.
clear
set more off
input ///
obs id action1 action2 year
1 1 1 0 2000
2 1 0 1 2001
3 1 0 1 2002
4 1 0 1 2003
5 2 1 0 2000
6 2 0 1 2001
7 2 1 0 2002
8 2 0 1 2003
end
list, sepby(id)
*-----
bysort id (year) : keep if action1 | (action1[_n-1] + action1[_n-2] > 0)
list, sepby(id)
What is between parenthesis evaluates to one or zero depending on whether the inequality is true or false, respectively. This fragment indicates if action 1 was taken in either of the previous two observations.
You need to decide what to do with the first two observations, as they can't be compared with exactly two previous observations (they don't exist). In the following example they are always kept, because comparing with a non-existant observation in this case implies adding missing values, which results in missing. A missing is considered a very large number in Stata.
You can also work with time-series operators (help tsvarlist
, help xtset
) and really respect the time variable. Here, I work with the previous two observations. That may or may not coincide with the previous two time points.
I think your two actions are mutually exclusive, but you are not explicit about it.
Upvotes: 0