User1414
User1414

Reputation: 33

Replacing Missing Value with non-missing in the same dataset

I have a question regarding the following problem. I have data that looks like this:

State Total
AZ    1000
AZ    1000
AZ    -
CA    -
CA    4000

That is, I have missing data for the variable "total" for some observations. I would like to replace the missing values with total from non-missing observations.

Desired output

enter code here
State Total
AZ    1000
AZ    1000
AZ    **1000**
CA    **4000**
CA    4000

Any ideas?

Upvotes: 0

Views: 310

Answers (3)

Shenglin Chen
Shenglin Chen

Reputation: 4554

Merge to impute with mean.

proc sql;
   select a.state,coalesce(a.total,b.total) from have a left join (select distinct state,mean(total) as total from have group by state) b on a.state=b.state;
quit; 

Upvotes: 0

Reeza
Reeza

Reputation: 21294

If your values are constant use PROC STANDARDIZE to replace the missing values.

Proc stdize data=have out=want missing=mean reponly;
By state;
Var amount;
Run;

Upvotes: 1

pinegulf
pinegulf

Reputation: 1396

Here is a solution I came up with. Surely there are more elegant ways to do this, but this is tested and works.

Idea is sort the data so that missing values are after proper ones. Then loop though each state. Save the 'total' value from first observation and apply it to any missing cells in the state.

data begin;
    length state $3 total 5;
    input state Total;
    cards;
    AZ    1000 @@
    AZ    1000 @@
    AZ     @@
    CA     @@
    CA    4000 @@
    OZ @@
    OZ    3000 @@
    OZ @@
    ;
run;

proc sort data=begin; by state descending total ; run;

data Filled;
    set begin;
    by state; /*Handle each state as own subset*/
    retain memory; /*Keeps the 'memory' from prior observations and not from column */
    if first.state then memory=total; /*Save the value to temporary column*/
    if total=. then total=memory; /*Fill blanks*/

    drop memory; /*Cleanup*/
run;

Upvotes: 0

Related Questions