Reputation: 55
I have a dataset with observations at specific timepoints, but those timepoints (and the length of time between them) vary by group. I'm trying to "fill down" the data so that existing observations are carried down into missing cells. But I only want to do this for a certain number of rows after the original observation. So for example, I could have a dataset that looks like this:
For group A, I'd want to fill in the value for 2002 with 2001's value, 2004 with 2003, etc. I wouldn't want to fill in 2000 at all, since I don't have the preceding value. And I ALSO wouldn't want to fill in the 2011 value, because the "cyclelength" variable tells me that group A's observations are supposed to take place every two years, so I don't want to carry data forward past that. 2011 is just a genuinely missing value.
Similarly, in group B, I'd want to carry 2000's value forward into years 2001, 2002, and 2003 (because the "cyclelength" here is 4 years). I'd want to carry 2004's value into 2005, 2006, and 2007, but not beyond that--the later years should stay missing.
I've tried setting this up with the "carryforward" command, but haven't figured out how to have it stop filling down after a specified number of years that varies by group. Is there a way to do this, either with carryforward or otherwise?
Upvotes: 0
Views: 14574
Reputation: 37183
This is a variation on a problem documented since 2000 as an FAQ: see here
The variation lies in limiting how far non-missing values are copied. But it falls easily to the same idea.
The last known value was recorded in certain years which we can copy down the dataset:
gen when_last_known = year if !missing(value)
bysort group (year) : replace when_last_known = when_last_known[_n-1] if missing(when_last_known)
Now the replacement wanted is
by group : replace value = value[_n-1] if missing(value) & (year - when_last_known) < cyclelength
That statement presupposes the sort
order of the previous statement.
On Statalist (see here) you'd be expected to document that carryforward
is a user-written command to be installed from SSC. That's a good convention here too.
In practice, it's good data management to keep the original data exactly as they arrive and do this on a clone of the variable. Sooner or later someone will ask to see the original values, and then you could be seriously embarrassed.
Upvotes: 3