Reputation: 346
I have a dataset in which a household id (hhid
) and a member id (mid
) identify a unique person. I have results from two separate surveys taken a year apart (surveyYear
). I also have data on whether or not the individual was enrolled in school at the time.
I want a binary variable which signifies if the individual in question dropped out of school between the surveys (i.e. 1 if dropped and 0 if still in school)
I have a decent understanding of Stata but this coding challenge seems a little beyond me because I am not sure how to compare the in-school status of the later id
with the earlier id
and then propagate that result into a binary column.
Here is an example of what I need
Previously:
+----------------------------------+
| hhid mid survey~r inschool |
|----------------------------------|
1. | 1 2 3 1 |
2. | 1 2 4 1 |
3. | 1 3 3 1 |
4. | 1 3 4 1 |
5. | 2 1 3 1 |
6. | 2 1 4 0 |
7. | 2 2 3 0 |
8. | 2 2 4 0 |
+----------------------------------+
After:
+--------------------------------------------+
| hhid mid survey~r inschool dropped |
|--------------------------------------------|
1. | 1 2 3 1 0 |
2. | 1 2 4 1 0 |
3. | 1 3 3 1 0 |
4. | 1 3 4 1 0 |
5. | 2 1 3 1 1 |
6. | 2 1 4 0 1 |
7. | 2 2 3 0 0 |
8. | 2 2 4 0 0 |
+--------------------------------------------+
Upvotes: 0
Views: 305
Reputation: 37208
bysort hhid mid (surveyyear) : gen dropped = inschool[1] == 1 & inschool[2] == 0
The commentary is longer than the code:
Within blocks of observations with the same hhid
and mid
, sort by surveyyear
.
You want students who were inschool
in year 3 but not in year 4. So, inschool
is 1 in the first observation and 0 in the second.
Here subscripting [1]
and [2]
refers to order within blocks of observations defined by the by:
statement.
If further detail is needed see e.g. this article. Note that contrary to one tag, no loop is needed (or, if you wish, that the loop over possibilities is built in to the by:
framework).
Upvotes: 1