Erdne Htábrob
Erdne Htábrob

Reputation: 879

fill the missings with the average of the closest observed

I would like to fill in the missing values with the average of the closest values. I previously used the approach described by Nick Cox, however, this time I have a slightly more complicated setup, with which I think the standard way of running this does not work.

As the sample shows, the chunks of missing data are not of equal length or not single cells. In order to fill them with the average of the closest values observed, I think I need a two-step solution: (1) identify the two closest values - the one before and the one after (if one of those two is missing, just use the value of the one observed in step 2 instead of their average) (2) calculate their average and fill the missing with this average value. The data is yearly, nested in countries. I post the values for the first country, Albania.

I need to do this for several variables, so perhaps a loop solution would work best. But that's just a side point, don't hesitate to post an answer without loops.

country_name    year    gini
Albania 1990    
Albania 1991    
Albania 1992    
Albania 1993    
Albania 1994    
Albania 1995    
Albania 1996    28
Albania 1997    
Albania 1998    
Albania 1999    
Albania 2000    
Albania 2001    
Albania 2002    31.74
Albania 2003    
Albania 2004    
Albania 2005    30.6
Albania 2006    
Albania 2007    
Albania 2008    29.98
Albania 2009    
Albania 2010    
Albania 2011    
Albania 2012    28.96
Albania 2013    
Albania 2014    
Albania 2015    
Albania 2016    

Thank you!

Upvotes: 0

Views: 570

Answers (1)

Nick Cox
Nick Cox

Reputation: 37358

Here's a self-contained example assuming that mipolate has been installed by ssc inst mipolate. The pchip method often works really well; nevertheless for these data I'd be inclined to use linear interpolation.

clear 
input str7 country_name    year    gini
Albania 1990    .
Albania 1991    .
Albania 1992    .
Albania 1993    .
Albania 1994    .
Albania 1995    .
Albania 1996    28
Albania 1997    .
Albania 1998    .
Albania 1999    .
Albania 2000    .
Albania 2001    .
Albania 2002    31.74
Albania 2003    .
Albania 2004    .
Albania 2005    30.6
Albania 2006    .
Albania 2007    .
Albania 2008    29.98
Albania 2009    .
Albania 2010    .
Albania 2011    .
Albania 2012    28.96
Albania 2013    .
Albania 2014    .
Albania 2015    .
Albania 2016    .
end 

mipolate gini year, gen(linear) epolate by(country)  
mipolate gini year, gen(pchip) pchip by(country) 

line linear pchip year || scatter gini year, ms(Oh)  

Upvotes: 1

Related Questions