Reputation: 23
My goal is to combine these tables into one without having to manually run my macro each time for each column.
The code I currently have with me is the following:
%macro task_Oct(set,col_name);
data _type_;
set &set;
call symputx('col_type', vtype(&col_name));
run;
proc sql;
create table work.oct27 as
select "&col_name" as variable,
"&col_type" as type,
nmiss(&col_name) as missing_val,
count(distinct &col_name) as distinct_val
from &set;
quit;
%mend task_Oct;
%task_Oct(sashelp.cars,Origin)
The above code gives me the following output:
|Var |Type |missing_val|distinct_val|
|Origin|Character|0 | 3 |
But the sashelp.cars data sheet has 15 columns and so I would like to output a new data sheet which has 15 rows with 4 columns.
I would like to get the following combined table as the output of my code:
|Var |Type |missing_val|distinct_val|
|Make |Character|0 | 38 |
|Model |Character|0 | 425 |
|Type |Character|0 | 6 |
|Origin|Character|0 | 3 |
...
...
...
Since I'm using a macro, I could run my code 15 different times by manually entering the names of the columns and then merging the tables into 1; and it wouldn't be a problem. But what if I have a table with 100s of columns? I could use some loop statement but I'm not sure how to go about that in this case. Help would be appreciated. Thank you.
Upvotes: 0
Views: 142
Reputation: 51566
The main output you appear to want can be generated directly by PROC FREQ with the NLEVELS option. If you want to add in the variable type then just merge it with the output of PROC CONTENTS.
ods exclude all;
ods output nlevels=nlevels;
proc freq data=sashelp.cars nlevels;
tables _all_ / noprint ;
run;
ods select all;
proc contents data=sashelp.cars noprint out=contents;
run;
proc sql;
create table want(drop=table:) as
select c.varnum,c.name,c.type,n.*
from contents c inner join nlevels n
on c.name=n.TableVar
order by varnum
;
quit;
Result
NNon
NMiss Miss
Obs VARNUM NAME TYPE NLevels Levels Levels
1 1 Make 2 38 0 38
2 2 Model 2 425 0 425
3 3 Type 2 6 0 6
4 4 Origin 2 3 0 3
5 5 DriveTrain 2 3 0 3
6 6 MSRP 1 410 0 410
7 7 Invoice 1 425 0 425
8 8 EngineSize 1 43 0 43
9 9 Cylinders 1 8 1 7
10 10 Horsepower 1 110 0 110
11 11 MPG_City 1 28 0 28
12 12 MPG_Highway 1 33 0 33
13 13 Weight 1 348 0 348
14 14 Wheelbase 1 40 0 40
15 15 Length 1 67 0 67
The NMissLevels variable counts the number of distinct types of missing values.
If instead you want to count the number of observations with (any) missing value you will need to use code generation. So use the CONTENTS data to generate an SQL query to generate all of the counts you want into a single observation with only one pass through the data. You can then transpose that to make it usable for re-merging with the CONTENTS data.
filename code temp;
data _null_;
set contents end=eof;
length nliteral $65 dsname $80;
nliteral=nliteral(name);
dsname = catx('.',libname,nliteral(memname));
file code;
if _n_=1 then put 'create table counts as select ' / ' ' @ ;
else put ',' @;
put 'nmiss(' nliteral ') as missing_' varnum
/',count(distinct ' nliteral ') as distinct_' varnum
;
if eof then put 'from ' dsname ';';
run;
proc sql;
%include code /source2;
quit;
proc transpose data=counts out=count2 name=name ;
run;
proc sql ;
create table want as
select c.varnum, c.name, c.type
, m.col1 as nmissing
, d.col1 as ndistinct
from contents c
left join count2 m on m.name like 'missing%' and c.varnum=input(scan(m.name,-1,'_'),32.)
left join count2 d on d.name like 'distinct%' and c.varnum=input(scan(d.name,-1,'_'),32.)
order by varnum
;
quit;
Result
Obs VARNUM NAME TYPE nmissing ndistinct
1 1 Make 2 0 38
2 2 Model 2 0 425
3 3 Type 2 0 6
4 4 Origin 2 0 3
5 5 DriveTrain 2 0 3
6 6 MSRP 1 0 410
7 7 Invoice 1 0 425
8 8 EngineSize 1 0 43
9 9 Cylinders 1 2 7
10 10 Horsepower 1 0 110
11 11 MPG_City 1 0 28
12 12 MPG_Highway 1 0 33
13 13 Weight 1 0 348
14 14 Wheelbase 1 0 40
15 15 Length 1 0 67
Upvotes: 1