saslearner
saslearner

Reputation: 23

Could someone please explain what the following lines of this code do?

This is the code that I have with me:

%let data=sashelp.cars;

    proc transpose data=&data(obs=0) out=names;
       var _all_;
       run;
    proc sql;
       select cats('_',_name_,'=missing(',_name_,');') into: stmts separated by ' ' from names;
       run;
    
    data missing;
       set &data;
       _BIGN = 1;
       /*%m_expand_varlist(data=&data,expr=cats('_',_name_,'=missing(',_name_,');'));*/
       &stmts;
       keep _:;
       run;
    proc summary data=missing;
       var _numeric_;
       output out=smry sum=;
       run;
    proc transpose data=smry(drop=_type_ _freq_) out=smry_;
       run;
   

The goal of this code is to output the number of missing values for both character and numeric variables in the data set. The code accomplishes that objective but I have difficulty in understanding the purpose of certain lines in the code.

May I know what the following part of the code does?

 select cats('_',_name_,'=missing(',_name_,');') into: stmts separated by '  ' from names;

I understand that the into part just stores the value into the macro variable stmts but what does "separated by ' ' from names" in the above line mean?

data missing;
       set &data;
       _BIGN = 1;
       &stmts;
       keep _:;
       run;

And in the above portion of the code, what is the purpose of "keep :"? What does the ":" do in that? And is the "_BIGN = 1" necessary?

And also in the final output table called smry_, I get underscores before the names of the variables. But I don't need these underscores. What can I do to remove them? When I removed the underscore after the "keep :", the underscores in the smry table went away but I was left with only 10 rows instead of 15. Help would be appreciated. Thank you.

Upvotes: 0

Views: 167

Answers (4)

data _null_
data _null_

Reputation: 9109

You could just rename the underscore variables back to original.

%let data=sashelp.heart;
proc transpose data=&data(obs=0) out=names;
   var _all_;
   run;
proc sql;
   select cats('Label _',_name_,'=',quote(_label_),';') into: lb_stmts separated by ' ' from names;
   select cats('_',_name_,'=missing(',_name_,');') into: stmts separated by ' ' from names;
   select cats('_',_name_,'=',_name_) into: rn_stmts separated by ' ' from names;
   run;

options symbolgen=1;
data missingV / view=missingV;
   set &data;
   &stmts;
   &lb_stmts;
   rename &rn_stmts;
   keep _:;
   run;
options symbolgen=0;
ods output summary=summary;
proc means data=missingV Sum Mean N STACKODSOUTPUT;
   var _numeric_;
   run;
ods output close;
proc print width=minimum label;
   label sum='#Missing' mean='%Missing';
   format sum 8. mean percentn8.1;
   run;

enter image description here

Upvotes: 0

saslearner
saslearner

Reputation: 23

I managed to remove the underscores in the final data set using the following code:

data _smry_(drop =  _name_);
set smry_;
name=compress(_name_, , 'kas');
run;

Upvotes: 0

gregor
gregor

Reputation: 304

As you know the "into" clause stores the values in the macro variable. The "separated by" leads to a list of values stored in the variable with spaces as delimiter here. If you don't use this you have only the value of the first row in your macro variable.

The ":" is a wildcard that means you keep all the variables starting with an underscore:

keep _:;

Upvotes: 1

PeterClemmensen
PeterClemmensen

Reputation: 4937

Before answering your questions, let me disclaim that this is not the way to go if you simply want a count of missing values for each numeric variable in your data.

However, this seems to be more of a practice assignment than an actual problem.

  1. The Separated By Clause simply inserts the specific string between the values in the macro variable if the data source has >1 items. In this case the names data set has 15 items, so all 15 values are listed with a few spaces between them.
  2. The colon operator in the keep statement tells the data step to keep only variables prefixed with an underscore.
  3. _BIGN is not strictly necessary. However, it seems that the author of the code wants a simple count of observations in the final data set. That is all it does.
  4. The underscores are applied to each variable name in the creation of the macro variable in. It is probably done to avoid conflicts between variable names (though this is technically still possible). Obviously, you can simply remove the underscore in the final data set.

Upvotes: 1

Related Questions