Reputation: 33
I have a question regarding the following problem. I have data that looks like this:
State Total
AZ 1000
AZ 1000
AZ -
CA -
CA 4000
That is, I have missing data for the variable "total" for some observations. I would like to replace the missing values with total from non-missing observations.
Desired output
enter code here
State Total
AZ 1000
AZ 1000
AZ **1000**
CA **4000**
CA 4000
Any ideas?
Upvotes: 0
Views: 310
Reputation: 4554
Merge to impute with mean.
proc sql;
select a.state,coalesce(a.total,b.total) from have a left join (select distinct state,mean(total) as total from have group by state) b on a.state=b.state;
quit;
Upvotes: 0
Reputation: 21294
If your values are constant use PROC STANDARDIZE to replace the missing values.
Proc stdize data=have out=want missing=mean reponly;
By state;
Var amount;
Run;
Upvotes: 1
Reputation: 1396
Here is a solution I came up with. Surely there are more elegant ways to do this, but this is tested and works.
Idea is sort the data so that missing values are after proper ones. Then loop though each state. Save the 'total' value from first observation and apply it to any missing cells in the state.
data begin;
length state $3 total 5;
input state Total;
cards;
AZ 1000 @@
AZ 1000 @@
AZ @@
CA @@
CA 4000 @@
OZ @@
OZ 3000 @@
OZ @@
;
run;
proc sort data=begin; by state descending total ; run;
data Filled;
set begin;
by state; /*Handle each state as own subset*/
retain memory; /*Keeps the 'memory' from prior observations and not from column */
if first.state then memory=total; /*Save the value to temporary column*/
if total=. then total=memory; /*Fill blanks*/
drop memory; /*Cleanup*/
run;
Upvotes: 0