Reputation: 3
I have data in the following format (there are a lot more variables):
year ID Dummy
1495 65 1
1496 65 1
1501 65 1
1502 65 1
1520 65 0
1522 65 0
What I am trying to achieve is conditionally create new observations that fills in the data between two points in time conditional on a dummy. If the dummy is equal to 1
, the data is supposed to be filled in. If the variable is equal to 0
then it shall not be filled in.
For example:
year ID Dummy
1495 65 1
1496 65 1
1497 65 1
1498 65 1
.
.
1501 65 1
1502 65 1
1503 65 1
1504 65 1
.
.
.
1520 65 0
1522 65 0
Upvotes: 0
Views: 72
Reputation:
Here's one way to do this:
clear
input year id dummy
1495 65 1
1496 65 1
1501 65 1
1502 65 1
1520 65 0
1522 65 0
end
generate tag = year[_n] != year[_n+1] & dummy == 1
generate delta = year[_n] - year[_n+1] if tag
replace delta = . if abs(delta) == 1
expand abs(delta) if tag & delta != .
sort year
bysort year: egen seq = seq() if delta != .
replace seq = seq - 1
replace seq = 0 if seq == .
replace year = year + seq if year != .
drop tag delta seq
The above code snippet will produce:
list
+-------------------+
| year id dummy |
|-------------------|
1. | 1495 65 1 |
2. | 1496 65 1 |
3. | 1497 65 1 |
4. | 1498 65 1 |
5. | 1499 65 1 |
|-------------------|
6. | 1500 65 1 |
7. | 1501 65 1 |
8. | 1502 65 1 |
9. | 1503 65 1 |
10. | 1504 65 1 |
|-------------------|
11. | 1505 65 1 |
12. | 1506 65 1 |
13. | 1507 65 1 |
14. | 1508 65 1 |
15. | 1509 65 1 |
|-------------------|
16. | 1510 65 1 |
17. | 1511 65 1 |
18. | 1512 65 1 |
19. | 1513 65 1 |
20. | 1514 65 1 |
|-------------------|
21. | 1515 65 1 |
22. | 1516 65 1 |
23. | 1517 65 1 |
24. | 1518 65 1 |
25. | 1519 65 1 |
|-------------------|
26. | 1520 65 0 |
27. | 1522 65 0 |
+-------------------+
Upvotes: 1