MichaelZ
MichaelZ

Reputation: 11

SAS, keeping only columns that contain a certain character

I would like to know is it possible to perform an action that keeps only columns that contain a certain character.

For example, lets say that I have columns: name, surname, sex, age. I want to keep only columns that start with letter 's' (surname and sex). How do I do that?

Upvotes: 1

Views: 7888

Answers (4)

Richard
Richard

Reputation: 27536

For complex variable name selection filtering, such as regular expressions, or lookup in external metadata control table, you will need to process the metadata of the table itself to construct source that can be applied.

This example demonstrates two, of many, ways that source code can be generated.

  • metadata table from target table, Proc CONTENTS
  • process metadata, Proc SQL
  • construct source code
    • Expectation of name lists < 64K
      • SQL INTO :<macro-variable> for source code expected to be < 64K characters
    • Very large name lists or robust
      • A macro that streams source code from metadata table

From a data set with 50,000 variables select the columns whose name contains 2912

data have;
  retain id 'HOOPLA12345' x1-x50000 .;
  stop;
run;

* obtain metadata of target table;
proc contents noprint data=have 
  out=varlist_table
    ( keep=name 
      where= (
        prxmatch('/x.*2912.*/',name)         /* name selection criteria */
      )
    );
run;

* Short lists;    
* construct source code for name list;
proc sql noprint;
  select name into :varlist separated by ' ' from varlist_table;

data want;
  set have (keep=&varlist);  /* apply generated source code */
run;

* Arbitrary or Long lists expected;
%macro stream_column (data=, column=);
  %local dsid index &column;
  %let dsid=%sysfunc(open(&data(keep=&column)));
  %if &dsid %then %do;
    %syscall SET(dsid);
    %do %while (0=%sysfunc(fetch(&dsid)));
      &&&column.  /* emit column value from table */
    %end;
    %let dsid = %sysfunc(close(&dsid));
  %end;
%mend;

options mprint;
data want2;
  set have (keep=
       /* stream source code as macro text emissions */
       %stream_column(data=varlist_table,column=name)
  );
run;

Upvotes: 1

PeterClemmensen
PeterClemmensen

Reputation: 4937

If you want do keep only variables that start with an s, then use name prefix list operator :.

data want;
   set have(keep=s:);
run;

Upvotes: 4

Reeza
Reeza

Reputation: 21294

There's several variations on how to filter out names.

For prefixes or lists of variables it's pretty easy. For suffixes or more complex patterns it keeps more complicated. In general you can short cut lists as follows:

_numeric_ : all numeric variables
_character_ : all character variables
_all_  : all variables
prefix1 - prefix# : all variables with the same prefix assuming they're numbered
prefix:  : all variables that start with prefix
firstVar -- lastVar : variables based on location between first and last variable, including the first and last. 
first-numeric-lastVar : variables that are numeric based on location between first and last variable

Anything more complex requires that you filter it via the metadata list. SAS basically keeps some metadata about each data set so you can query that information to build your lists. Data about columns and types are in the sashelp.vcolumn or dictionary.column data set.

To filter all columns that have the word mpg for example:

*generate variable list;
proc sql noprint;
select name into :var_list separated by " "
from sashelp.vcolumn

where libname = 'SASHELP' and memname = 'CARS' 
and lowcase(name) like '%mpg%';
quit;

*check log for results;
%put &var_list;

*verification from original table;
proc contents data=sashelp.cars;
run;

*example of usage;
data want;
set sashelp.cars;
keep &var_list;
run;

Some more details are available in this blog post and here (documentation).

Upvotes: 4

Wilson Jimenez
Wilson Jimenez

Reputation: 60

It's possible. In the code below I created a macro variable that has the name of columns that have in a table. After run the code you will have the name of columns you want.

PROC SQL;
   SELECT 
    NAME
INTO:
    NMVAR /*    SAVE IN MACRO VARIABLE  */
FROM SASHELP.VCOLUMN
WHERE 
    LIBNAME EQ "YOUR LIBNAME" AND /* THE NAME OF LIB MUST BE WRITTEN IN UPPERCASE */
    MEMNAME EQ "YOUR TABLE" AND /* THE NAME OF 'TABLE/DATA SET' MUST BE WRITTEN IN UPPERCASE */
    SUBSTR(NAME,1,1) EQ "S";

RUN;

Upvotes: 1

Related Questions