Samvel Avakyan
Samvel Avakyan

Reputation: 39

Converting unbalanced panels into balanced/dropping multiple observations on single condition

There is a dataset with 3 variables - ID, Wage and Year, it is an unbalanced panel.There are 2 problems:

  1. I want to drop all data on such IDs for which there is a Year with no observations. Shortly, i want to convert my unbalanced panel into balanced dropping every id that creates this "unbalanceness".

For example, if a guy with ID = 1 didn't report his Wage in a Year = 2010 (and therefore there is no observation with Year = 2010 and ID = 1), I want to drop all data for ID = 1.

It seems like a popular question, but all I found on Google and StackOverflow were multiple solutions for Stata and none for SPSS.

UPDATE: I managed to solve this problem using COUNTIF Excel function. I created a variabe that counted amount of times certain ID appeared in dataset and kept obseravtions for which this function=amount of years, thus dropping unbalanced IDs. However, i'm still in dire need of solution to the second problem :)

  1. Second question is almost the same as the first one - I want to drop all data on such IDs for which there is a Year when they reported Wage = 0

For example, if a guy with ID = 1 reported Wage = 0 in a Year = 2010, I want to drop all data for ID = 1.

If there is a filling command in SPSS that balances unbalanced panel with missing values, it seems like solution to second problem is a solution to the first one at the same time.

UPDATE 2: I solved this problem as well using COUNTIFS on Wage and ID. Excel is omnipotent, praise Excel.

Upvotes: 1

Views: 1295

Answers (2)

KenHBS
KenHBS

Reputation: 7164

I don't know what the data is used for, but if it's something important, you should seriously reconsider deleting the observations with missing variables.

Often, especially in data on wages, a missing value tells you something about the value that should have been recorded (Link to Wikipedia, Keywords: MAR, MCAR, MNAR)). There are no easy ways to get rid of this bias in your sample, but simply deleting the observation is not a serious option. There are algorithms that manage to cleverly impute missing values, based on the other values in the dataset.

If you'd like, I could invest a bit more time and help you find a suitable algorithm to impute the missing values..

Upvotes: 0

eli-k
eli-k

Reputation: 11310

This will solve both tasks:

recode Wage (0=sysmis).
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /Wage_nmiss=NMISS(Wage).
select if Wage_nmiss=0.
execute.

Upvotes: 1

Related Questions