Reputation: 2342
I have tables in SAS Enterprise Guide like below:
TABLE 1
COL1 | COL2 | ... | COLn
-----|------|------|-------
123 | | ... | xxx
| AAA | ... | xxx
122 | BCC | ... | xxx
... | ... | ... | xxx
And also 2 other tables like below:
TABLE 2
COL1 | ...| COLn
-----|----|------
998 | ...| xxx
999 | ...| xxx
001 | ...| xxx
... | ...| ...
TABLE 3
COL8 | ...| COLn
-----|----|------
117 | ...| xxx
906 | ...| xxx
201 | ...| xxx
... | ...| ...
As you can see TABLE 1 has missing values and I need to:
So, as a result I need something like below, because COL1 (where missing is fill by 0) is in TABLE 2 and COL2 is neither in TABLE 2 nor in TABLE 3, moreover COL2 is not numeric
COL1 | COL2 | ... | COLn
-----|------|------|-------
123 | | ... | xxx
0 | AAA | ... | xxx
122 | BCC | ... | xxx
... | ... | ... | xxx
Only TABLE 1 is relevant, if some column from TABLE 1 is numeric, has missing and exists in TABLE 2 or TABLE 3 fill this column by 0
How can I do that in SAS Enterprise Guide ?
Upvotes: 1
Views: 583
Reputation: 532
This is another solution using a macro. It is not as elegant as what Kermit did, but it is a bit easier to add another table with columns to 'demissing'.
Those are our tables.
data table1;
input col1 col2 $ col3 col4 col5 col6 col7 col8 col9 col10;
datalines;
123 . . 5 3 71 34 6 34 64
. ABC 23 5 . 77 5 . 34 .
126 GRE . 5 . . 34 6 34 51
128 . . 5 65 77 . 6 . 64
. VDA 23 . 65 . . 6 . .
131 . 23 5 . 77 34 6 . 17
. FRR 23 . . 1 34 6 . 12
151 . . 5 65 . 34 6 34 .
. . 23 5 2 . 34 6 . 6
. BTR . . 6 77 34 . 34 64
162 . . 5 65 77 . . 4 57
;
run;
data table2;
input col1 col3 col5 col7 $;
datalines;
1 4 5 B
44 52 732 C
;
run;
data table3;
input col4 col5 col11;
datalines;
55 64 77
1 2 3
;
run;
We create a macro for fixing missing values.
%macro Fix_missings(table);
proc sql noprint;
select distinct upcase(name) into :col1-
from sashelp.vcolumn
where upcase(memname) = "%upcase(&table)" and type = 'num'
having upcase(name) in
(select distinct upcase(name) from sashelp.vcolumn
where upcase(memname) = 'TABLE1' and type = 'num');
quit;
data table1;
set table1;
%do i=1 %to &sqlobs;
if &&col&i = . then do;
&&col&i = 0;
end;
%end;
run;
%mend Fix_missings;
And then we call that macro for those tables that contain our column names.
%Fix_missings(table2);
%Fix_missings(table3);
Only those columns were fixed that are in our table2 and table3. And any column that is char type is ignored, both if it's in primary table (table1) or in secondarly one (table2 or table3)
Upvotes: 0
Reputation: 27508
Here is an approach that does not rely on the macro system.
data want;
set table1;
array num _numeric_;
array num0 [0:10000] _temporary_;
if _n_ = 1 then do;
__tid = open ('table2');
do _i_ = 1 to attrn(__tid,'NVAR');
_name_ = lowcase(varname(__tid,_i_));
do _j_ = 1 to dim(num) until (_name_ = lowcase(vname(num(_j_)))); end;
num0[_j_] = 1;
end;
__tid = open ('table3');
do _i_ = 1 to attrn(__tid,'NVAR');
_name_ = lowcase(varname(__tid,_i_));
do _j_ = 1 to dim(num) until (_name_ = lowcase(vname(num(_j_)))); end;
num0[_j_] = 1;
end;
drop __tid _i_ _name_ _j_;
end;
do _i_ = 1 to dim(num);
if missing(num(_i_)) and num0(_i_) then num(_i_) = 0;
end;
run;
The same idea using Proc DS2
and a hash.
proc ds2;
data want2 (overwrite=yes);
vararray double num[*] double;
declare char(32) column_name; drop column_name;
declare package hash lookup();
method init();
lookup.keys([column_name]);
lookup.dataset('
{ select distinct lowcase(COLUMN_NAME) as column_name
from dictionary.columns
where TABLE_SCHEM = ''WORK''
and TABLE_NAME in
( ''TABLE2''
, ''TABLE3''
)
}
');
lookup.defineDone();
end;
method run();
set table1;
do int _i_ = 1 to dim(num);
declare char(32) name;
column_name = vname(num[_i_]);
if missing(num[_i_]) and lookup.check()=0 then num[_i_] = 0;
end;
end;
enddata;
run;
quit;
%let syslast = want2;
Upvotes: 2
Reputation: 3117
Create sample data sets
data tab1;
infile datalines delimiter='|' dsd;
input col1 col2 $ col3 col4 $ col5 col6;
datalines;
1|ABC|.|ABC|5|9
.|DEF|8||10|.
;
run;
data tab2;
format col1 8. col2 $8. col5 8.;
stop;
run;
data tab3;
format col3 8. col4 $8.;
stop;
run;
col1 col2 col3 col4 col5 col6
1 ABC . ABC 5 9
. DEF 8 10 .
If you are only interested in the number of missing values for numeric variables, then a single call to the MEANS procedure computes the answer.
We use the ods output
statement to output the results to a SAS table tofill
.
We use the stackods
option, it allows the data set to resemble the default printed output from PROC MEANS
.
proc means data=tab1 nmiss n stackods;
ods output summary=tofill(where=(nmiss>0));
run;
Then we use the same kind of approach as your last questions here and here to match retrieve the columns that exists in either TAB2
or TAB3
. We fill the column names in a macro variable tofill
proc sql noprint;
create table names as
select distinct upcase(name) as name
from sashelp.vcolumn
where memname in ('TAB2', 'TAB3');
select upcase(variable) into :tofill separated by ' '
from tofill
where upcase(variable) in (select name from names)
;
quit;
Finally, we replace the missing values of the columns we are interested in by 0
data want;
set tab1;
array cols &tofill.;
do over cols;
if missing(cols) then cols = 0;
end;
run;
col1 col2 col3 col4 col5 col6
1 ABC 0 ABC 5 9
0 DEF 8 10 .
As desired
col1
and col3
are replaced by 0 because they are both numeric columns and exists in either tab2
or tab3
col4
remain missing as it is a character variablecol6
remain missing as this column does not exist in either tab2
or tab3
.Upvotes: 1
Reputation: 1
Maybe you could use default values in your objects so when you construct it (right before storing to db) it would fill all empty fields with default values.
Upvotes: -1