Reputation: 879
I would like to fill in the missing values with the average of the closest values. I previously used the approach described by Nick Cox, however, this time I have a slightly more complicated setup, with which I think the standard way of running this does not work.
As the sample shows, the chunks of missing data are not of equal length or not single cells. In order to fill them with the average of the closest values observed, I think I need a two-step solution: (1) identify the two closest values - the one before and the one after (if one of those two is missing, just use the value of the one observed in step 2 instead of their average) (2) calculate their average and fill the missing with this average value. The data is yearly, nested in countries. I post the values for the first country, Albania.
I need to do this for several variables, so perhaps a loop solution would work best. But that's just a side point, don't hesitate to post an answer without loops.
country_name year gini
Albania 1990
Albania 1991
Albania 1992
Albania 1993
Albania 1994
Albania 1995
Albania 1996 28
Albania 1997
Albania 1998
Albania 1999
Albania 2000
Albania 2001
Albania 2002 31.74
Albania 2003
Albania 2004
Albania 2005 30.6
Albania 2006
Albania 2007
Albania 2008 29.98
Albania 2009
Albania 2010
Albania 2011
Albania 2012 28.96
Albania 2013
Albania 2014
Albania 2015
Albania 2016
Thank you!
Upvotes: 0
Views: 570
Reputation: 37358
Here's a self-contained example assuming that mipolate
has been installed by ssc inst mipolate
. The pchip method often works really well; nevertheless for these data I'd be inclined to use linear interpolation.
clear
input str7 country_name year gini
Albania 1990 .
Albania 1991 .
Albania 1992 .
Albania 1993 .
Albania 1994 .
Albania 1995 .
Albania 1996 28
Albania 1997 .
Albania 1998 .
Albania 1999 .
Albania 2000 .
Albania 2001 .
Albania 2002 31.74
Albania 2003 .
Albania 2004 .
Albania 2005 30.6
Albania 2006 .
Albania 2007 .
Albania 2008 29.98
Albania 2009 .
Albania 2010 .
Albania 2011 .
Albania 2012 28.96
Albania 2013 .
Albania 2014 .
Albania 2015 .
Albania 2016 .
end
mipolate gini year, gen(linear) epolate by(country)
mipolate gini year, gen(pchip) pchip by(country)
line linear pchip year || scatter gini year, ms(Oh)
Upvotes: 1