saslearner
saslearner

Reputation: 23

How do I get the following combined table through SAS?

My goal is to combine these tables into one without having to manually run my macro each time for each column.

The code I currently have with me is the following:

%macro task_Oct(set,col_name);

data _type_;
set &set;
call symputx('col_type', vtype(&col_name));
run;

proc sql; 
create table work.oct27 as
select "&col_name" as variable, 
"&col_type" as type, 
            nmiss(&col_name) as missing_val, 
            count(distinct &col_name) as distinct_val
    from &set;
quit;

%mend task_Oct;

%task_Oct(sashelp.cars,Origin)

The above code gives me the following output:

|Var   |Type     |missing_val|distinct_val|
|Origin|Character|0          | 3          |

But the sashelp.cars data sheet has 15 columns and so I would like to output a new data sheet which has 15 rows with 4 columns.

I would like to get the following combined table as the output of my code:

|Var   |Type     |missing_val|distinct_val|
|Make  |Character|0          | 38         |
|Model |Character|0          | 425        |
|Type  |Character|0          | 6          |
|Origin|Character|0          | 3          |
...
...
...

Since I'm using a macro, I could run my code 15 different times by manually entering the names of the columns and then merging the tables into 1; and it wouldn't be a problem. But what if I have a table with 100s of columns? I could use some loop statement but I'm not sure how to go about that in this case. Help would be appreciated. Thank you.

Upvotes: 0

Views: 142

Answers (1)

Tom
Tom

Reputation: 51566

The main output you appear to want can be generated directly by PROC FREQ with the NLEVELS option. If you want to add in the variable type then just merge it with the output of PROC CONTENTS.

ods exclude all;
ods output nlevels=nlevels;
proc freq data=sashelp.cars nlevels;
  tables _all_ / noprint ;
run;
ods select all;
proc contents data=sashelp.cars noprint out=contents;
run;

proc sql;
create table want(drop=table:) as 
  select c.varnum,c.name,c.type,n.*
  from contents c inner join nlevels n
  on c.name=n.TableVar
  order by varnum
;
quit;

Result

                                                                    NNon
                                                       NMiss        Miss
Obs    VARNUM    NAME           TYPE     NLevels      Levels      Levels

  1       1      Make             2           38           0          38
  2       2      Model            2          425           0         425
  3       3      Type             2            6           0           6
  4       4      Origin           2            3           0           3
  5       5      DriveTrain       2            3           0           3
  6       6      MSRP             1          410           0         410
  7       7      Invoice          1          425           0         425
  8       8      EngineSize       1           43           0          43
  9       9      Cylinders        1            8           1           7
 10      10      Horsepower       1          110           0         110
 11      11      MPG_City         1           28           0          28
 12      12      MPG_Highway      1           33           0          33
 13      13      Weight           1          348           0         348
 14      14      Wheelbase        1           40           0          40
 15      15      Length           1           67           0          67

The NMissLevels variable counts the number of distinct types of missing values.

If instead you want to count the number of observations with (any) missing value you will need to use code generation. So use the CONTENTS data to generate an SQL query to generate all of the counts you want into a single observation with only one pass through the data. You can then transpose that to make it usable for re-merging with the CONTENTS data.

filename code temp;
data _null_;
  set contents end=eof;
  length nliteral $65 dsname $80;
  nliteral=nliteral(name);
  dsname = catx('.',libname,nliteral(memname));
  file code;
  if _n_=1 then put 'create table counts as select ' / ' ' @ ;
  else put ',' @;
  put 'nmiss(' nliteral ') as missing_' varnum 
    /',count(distinct ' nliteral ') as distinct_' varnum
  ;
  if eof then put 'from ' dsname ';';
run;
proc sql;
%include code /source2;
quit;
proc transpose data=counts out=count2 name=name ;
run;

proc sql ;
create table want as 
  select c.varnum, c.name, c.type 
       , m.col1 as nmissing
       , d.col1 as ndistinct
  from contents c 
  left join count2 m on m.name like 'missing%' and c.varnum=input(scan(m.name,-1,'_'),32.)
  left join count2 d on d.name like 'distinct%' and c.varnum=input(scan(d.name,-1,'_'),32.)
  order by varnum
;
quit;

Result

Obs    VARNUM    NAME           TYPE    nmissing    ndistinct

  1       1      Make             2         0           38
  2       2      Model            2         0          425
  3       3      Type             2         0            6
  4       4      Origin           2         0            3
  5       5      DriveTrain       2         0            3
  6       6      MSRP             1         0          410
  7       7      Invoice          1         0          425
  8       8      EngineSize       1         0           43
  9       9      Cylinders        1         2            7
 10      10      Horsepower       1         0          110
 11      11      MPG_City         1         0           28
 12      12      MPG_Highway      1         0           33
 13      13      Weight           1         0          348
 14      14      Wheelbase        1         0           40
 15      15      Length           1         0           67

Upvotes: 1

Related Questions