Reputation: 25
I have panel data:
Id | Wave| Localisation| Baseline
1 | 1 | AA | 1
1 | 2 | . | 0
1 | 3 | . | 0
2 | 2 | AB | 1
2 | 3 | . | 0
3 | 1 | AB | 1
3 | 3 | . | 0
4 | 2 | AC | 1
4 | 3 | . | 0
Some variable values of one panel (hhsize
, localisation
, whatever) serves as a reference (these values are included in the baseline interview only).
By consequence, for each id
we do not have all the information. For the id==1
for instance, we have missing values in the column Localisation
for not-baseline-interview (baseline==0
).
I would like to spread the baseline values to each panel. That is, I want to replace missing values .
in column Localisation
by Localisation
given in the baseline interview for each id
.
In fact, the information Localisation
remain the same across different waves. So it is useful to know the localization for each wave for one person.
Upvotes: 0
Views: 635
Reputation:
If the data pattern you present is the same for the entire dataset, then the following should work:
clear
input Id Wave str2 Localisation Baseline
1 1 AA 1
1 2 . 0
1 3 . 0
2 2 AB 1
2 3 . 0
3 1 AB 1
3 3 . 0
4 2 AC 1
4 3 . 0
end
bysort Id (Wave): replace Localisation = Localisation[1] if Localisation == "."
list, sepby(Id) abbreviate(15)
+-------------------------------------+
| Id Wave Localisation Baseline |
|-------------------------------------|
1. | 1 1 AA 1 |
2. | 1 2 AA 0 |
3. | 1 3 AA 0 |
|-------------------------------------|
4. | 2 2 AB 1 |
5. | 2 3 AB 0 |
|-------------------------------------|
6. | 3 1 AB 1 |
7. | 3 3 AB 0 |
|-------------------------------------|
8. | 4 2 AC 1 |
9. | 4 3 AC 0 |
+-------------------------------------+
Upvotes: 2