I have tables in SAS Enterprise Guide like below: TABLE 1 COL1 | COL2 | ... | COLn -----|------|------|------- 123 | | ... | xxx | AAA | ... | xxx 122 | BCC | ... | xxx ... | ... | ... | xxx And also 2 other tables like below: TABLE 2 COL1 | ...| COLn -----|----|------ 998 | ...| xxx 999 | ...| xxx 001 | ...| xxx ... | ...| ... TABLE 3 COL8 | ...| COLn -----|----|------ 117 | ...| xxx 906 | ...| xxx 201 | ...| xxx ... | ...| ... As you can see TABLE 1 has missing values and I need to: fill all missing values of NUMERIC variables by using 0 in each column of TABLE 1 if this column also exists in TABLE 2 or TABLE 3 So, as a result I need something like below, because COL1 (where missing is fill by 0) is in TABLE 2 and COL2 is neither in TABLE 2 nor in TABLE 3, moreover COL2 is not numeric COL1 | COL2 | ... | COLn -----|------|------|------- 123 | | ... | xxx 0 | AAA | ... | xxx 122 | BCC | ... | xxx ... | ... | ... | xxx Only TABLE 1 is relevant, if some column from TABLE 1 is numeric, has missing and exists in TABLE 2 or TABLE 3 fill this column by 0 How can I do that in SAS Enterprise Guide ?

Reputation: 2342

How to fill missing values in numeric columns by using 0 in one table base on existing these columns in other tables in SAS Enterprise Guide?

I have tables in SAS Enterprise Guide like below:

TABLE 1

COL1 | COL2 | ...  | COLn
-----|------|------|-------
123  |      | ...  | xxx
     | AAA  | ...  | xxx
122  | BCC  | ...  | xxx
...  | ...  | ...  | xxx

And also 2 other tables like below:

TABLE 2

COL1 | ...| COLn
-----|----|------
998  | ...| xxx
999  | ...| xxx
001  | ...| xxx
...  | ...| ...

TABLE 3

COL8 | ...| COLn
-----|----|------
117  | ...| xxx
906  | ...| xxx
201  | ...| xxx
...  | ...| ...

As you can see TABLE 1 has missing values and I need to:

fill all missing values of NUMERIC variables by using 0 in each column of TABLE 1 if this column also exists in TABLE 2 or TABLE 3

So, as a result I need something like below, because COL1 (where missing is fill by 0) is in TABLE 2 and COL2 is neither in TABLE 2 nor in TABLE 3, moreover COL2 is not numeric

COL1 | COL2 | ...  | COLn
-----|------|------|-------
123  |      | ...  | xxx
0    | AAA  | ...  | xxx
122  | BCC  | ...  | xxx
...  | ...  | ...  | xxx

Only TABLE 1 is relevant, if some column from TABLE 1 is numeric, has missing and exists in TABLE 2 or TABLE 3 fill this column by 0

How can I do that in SAS Enterprise Guide ?

Upvotes: 1

Answers (4)

Negdo

Reputation: 532

This is another solution using a macro. It is not as elegant as what Kermit did, but it is a bit easier to add another table with columns to 'demissing'.

Those are our tables.

data table1;
    input col1 col2 $ col3 col4 col5 col6 col7 col8 col9 col10;
    datalines;
123  . . 5 3 71 34 6 34 64
. ABC 23 5 . 77 5 . 34 .
126 GRE . 5 . . 34 6 34 51
128 . . 5 65 77 . 6 . 64
. VDA 23 . 65 . . 6 . .
131 . 23 5 . 77 34 6 . 17
. FRR 23 . . 1 34 6 . 12
151 . . 5 65 . 34 6 34 .
. . 23 5 2 . 34 6 . 6
. BTR . . 6 77 34 . 34 64
162 . . 5 65 77 . . 4 57
;
run;

data table2;
    input col1 col3 col5 col7 $;
    datalines;
1 4 5 B
44 52 732 C
;
run;

data table3;
    input col4 col5 col11;
    datalines;
55 64 77
1 2 3
;
run;

We create a macro for fixing missing values.

%macro Fix_missings(table);
proc sql noprint;
    select distinct upcase(name) into :col1- 
    from  sashelp.vcolumn
        where upcase(memname) = "%upcase(&table)" and type = 'num'
    having upcase(name) in 
    (select distinct upcase(name) from sashelp.vcolumn
        where upcase(memname) = 'TABLE1' and type = 'num');
quit;
data table1;
    set table1;
    %do i=1 %to &sqlobs;
        if &&col&i = . then do;
            &&col&i = 0;
        end;
    %end;
run;
%mend Fix_missings;

And then we call that macro for those tables that contain our column names.

%Fix_missings(table2);
%Fix_missings(table3);

Only those columns were fixed that are in our table2 and table3. And any column that is char type is ignored, both if it's in primary table (table1) or in secondarly one (table2 or table3)

Upvotes: 0

Richard

Reputation: 27508

Here is an approach that does not rely on the macro system.

data want;
  set table1;
  array num  _numeric_;
  array num0  [0:10000] _temporary_;

  if _n_ = 1 then do;
    __tid = open ('table2');
    do _i_ = 1 to attrn(__tid,'NVAR');
      _name_ = lowcase(varname(__tid,_i_));
      do _j_ = 1 to dim(num) until (_name_ = lowcase(vname(num(_j_)))); end;
      num0[_j_] = 1;
    end;
    __tid = open ('table3');
    do _i_ = 1 to attrn(__tid,'NVAR');
      _name_ = lowcase(varname(__tid,_i_));
      do _j_ = 1 to dim(num) until (_name_ = lowcase(vname(num(_j_)))); end;
      num0[_j_] = 1;
    end;
    drop __tid _i_ _name_ _j_;
  end;

  do _i_ = 1 to dim(num);
    if missing(num(_i_)) and num0(_i_) then num(_i_) = 0;
  end;
run;

The same idea using Proc DS2 and a hash.

proc ds2;
  data want2 (overwrite=yes);
    vararray double num[*] double;
    declare char(32) column_name; drop column_name;
    declare package hash lookup();

    method init();
      lookup.keys([column_name]);
      lookup.dataset('
        { select distinct lowcase(COLUMN_NAME) as column_name
          from dictionary.columns 
          where TABLE_SCHEM = ''WORK''
            and TABLE_NAME in
              ( ''TABLE2''
              , ''TABLE3''
              )
        }
      ');
      lookup.defineDone();
    end;

    method run();
      set table1;

      do int _i_ = 1 to dim(num);
        declare char(32) name;
        column_name = vname(num[_i_]);
        if missing(num[_i_]) and lookup.check()=0 then num[_i_] = 0;
      end;

    end;
  enddata;
  run;
quit;

%let syslast = want2;

Upvotes: 2

Kermit

Reputation: 3117

Create sample data sets

data tab1;
infile datalines delimiter='|' dsd;
input col1 col2 $ col3 col4 $ col5 col6;
datalines;
1|ABC|.|ABC|5|9
.|DEF|8||10|.
;
run;

data tab2;
format col1 8. col2 $8. col5 8.;
stop;
run;

data tab3;
format col3 8. col4 $8.;
stop;
run;

col1 col2 col3 col4 col5 col6
 1    ABC  .    ABC   5   9
 .    DEF  8         10   .

If you are only interested in the number of missing values for numeric variables, then a single call to the MEANS procedure computes the answer.
We use the ods output statement to output the results to a SAS table tofill. We use the stackods option, it allows the data set to resemble the default printed output from PROC MEANS.

proc means data=tab1 nmiss n stackods;
    ods output summary=tofill(where=(nmiss>0));
run;

Then we use the same kind of approach as your last questions here and here to match retrieve the columns that exists in either TAB2 or TAB3. We fill the column names in a macro variable tofill

proc sql noprint;
    create table names as 
    select distinct upcase(name) as name
    from sashelp.vcolumn
    where memname in ('TAB2', 'TAB3');

    select upcase(variable) into :tofill separated by ' '
    from tofill
    where upcase(variable) in (select name from names)
    ;
quit;

Finally, we replace the missing values of the columns we are interested in by 0

data want;
    set tab1;
    array cols &tofill.;
    do over cols;
        if missing(cols) then cols = 0;
    end;
run;

col1 col2 col3 col4 col5 col6
 1    ABC   0  ABC   5    9
 0    DEF   8       10    .

As desired

missing values of col1 and col3 are replaced by 0 because they are both numeric columns and exists in either tab2 or tab3
missing value of col4 remain missing as it is a character variable
missing value of col6 remain missing as this column does not exist in either tab2 or tab3.

Upvotes: 1

Konstantinos Kontaras

Reputation: 1

Maybe you could use default values in your objects so when you construct it (right before storing to db) it would fill all empty fields with default values.

Upvotes: -1

How to fill missing values in numeric columns by using 0 in one table base on existing these columns in other tables in SAS Enterprise Guide?

Answers (4)

Related Questions