Reputation: 851
The original data looks like this (Date
is in dd/mm/yyyy format):
ID Date Var
01 21/01/2016 1
01 22/01/2016
02 13/05/2016
02 14/05/2016 2
03 08/06/2016 3
03 08/06/2016 4
04 09/07/2016 5
04 09/07/2016
05 10/08/2016
05 10/08/2016 6
In essence, for each ID
there are three scenarios for the variable of interest Var
in regards to the date of submission Date
:
1/ Var
present on the earlier Date
but missing on the later one (ID=01)
2/ Var
missing on the earlier Date
but present on the later one (ID=02)
3/ Var
present on both Date
, whose values may be the same or different (ID=03)
.
Now how can I modify Var
as follows?
1/ Use value of Var
of the earlier Date
to fill up the later one.
2/ Use value of Var
of the later Date
to fill up the earlier one.
3/ Use value of Var
of the later Date
to replace the earlier one.
So the modified data should look like:
ID Date Var
01 21/01/2016 1
01 22/01/2016 1
02 13/05/2016 2
02 14/05/2016 2
03 08/06/2016 4
03 08/06/2016 4
Upvotes: 0
Views: 488
Reputation: 37278
Your examples are solved by
bysort ID (Var): replace Var = Var[1] if missing(Var)
bysort ID (Date): replace Var = Var[_N]
Replacing missing values in series using nearby known values is called interpolation. See e.g. here for notice of an interpolation command supporting the first method more generally (and other methods too).
Upvotes: 1