Samvel Avakyan
Samvel Avakyan

Reputation: 39

Recoding missing values into conditional means

Consider a dataset with 4 variables - RespondentID, Site, Year, and Price. I want to replace missing values in Price with a mean of Price for the exact Year and Site .

For example, if in Year 2009 in Paris mean reported Price was 22, i want to replace every missing Price value for which Year=2009 and Site=Paris with 22.

If i wanted to replace missing values with a zero (for example), i would create a syntax RECODE Price (SYSMIS=0) (ELSE=COPY), but i have no idea how to implement this conditional mean thing.

I also know that one can find each conditional mean using temporary. select if Year=2009 and Site=Paris. compute NewVar=mean(Price). However, there are too many combinations of Site and Year in my dataset to do it manually. By the way, all the variables in the actual dataset are numeric (I don't know whether it is important or not).

Any idea how to do it in SPSS or maybe in some other statistics software?

Upvotes: 2

Views: 231

Answers (2)

eli-k
eli-k

Reputation: 11310

AGGREGATE /OUT=* MODE=ADD /BREAK=Site Year /Price_mean=MEAN(Price).
if missing(price) price=Price_mean.
execute.

Upvotes: 2

David Marso
David Marso

Reputation: 71

Look at the AGGREGATE and MATCH commands (or RMV with SPLIT FILE). EDIT: Added example.

DATASET NAME rawdata.
SORT CASES BY Year Site.
DATASET DECLARE agg.
AGGREGATE OUTFILE agg / BREAK Year Site / Mean_Price=MEAN(Price).
MATCH FILES FILE =rawdata / TABLE agg / BY Year Site .
IF MISSING(Price) Price=Mean_Price.

*ALTERNATIVELY.
DATASET NAME rawdata.
SORT CASES BY Year Site.
SPLIT FILE BY Year SITE.
RMV /Mean_Price=SMEAN(Price).
IF MISSING(Price) Price=Mean_Price.
SPLIT FILE OFF.

Upvotes: 1

Related Questions