Reputation: 39
There is a dataset with 3 variables - ID, Wage and Year, it is an unbalanced panel.There are 2 problems:
For example, if a guy with ID = 1 didn't report his Wage in a Year = 2010 (and therefore there is no observation with Year = 2010 and ID = 1), I want to drop all data for ID = 1.
It seems like a popular question, but all I found on Google and StackOverflow were multiple solutions for Stata and none for SPSS.
UPDATE: I managed to solve this problem using COUNTIF Excel function. I created a variabe that counted amount of times certain ID appeared in dataset and kept obseravtions for which this function=amount of years, thus dropping unbalanced IDs. However, i'm still in dire need of solution to the second problem :)
For example, if a guy with ID = 1 reported Wage = 0 in a Year = 2010, I want to drop all data for ID = 1.
If there is a filling command in SPSS that balances unbalanced panel with missing values, it seems like solution to second problem is a solution to the first one at the same time.
UPDATE 2: I solved this problem as well using COUNTIFS on Wage and ID. Excel is omnipotent, praise Excel.
Upvotes: 1
Views: 1295
Reputation: 7164
I don't know what the data is used for, but if it's something important, you should seriously reconsider deleting the observations with missing variables.
Often, especially in data on wages, a missing value tells you something about the value that should have been recorded (Link to Wikipedia, Keywords: MAR, MCAR, MNAR)). There are no easy ways to get rid of this bias in your sample, but simply deleting the observation is not a serious option. There are algorithms that manage to cleverly impute missing values, based on the other values in the dataset.
If you'd like, I could invest a bit more time and help you find a suitable algorithm to impute the missing values..
Upvotes: 0
Reputation: 11310
This will solve both tasks:
recode Wage (0=sysmis).
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /Wage_nmiss=NMISS(Wage).
select if Wage_nmiss=0.
execute.
Upvotes: 1