Reputation: 187
I have a data set that contains several numbered columns, Code1, Code2, ... Code12
. I also have a set of values for those codes %myCodes = ('ABC','DEF','GHI',etc.)
.
What I want to do is filter so that I include only rows where at least one of the Code1 through Code12 columns contains a value from myCodes. I realize that I could just do a very long OR condition [e.g. (Code1 in &myCodes or Code2 in &myCodes or ...)
], but I was wondering if there is a more efficient way to do it.
Upvotes: 0
Views: 1622
Reputation: 159
I think I would concatenate col1-12 into a variable then use a regular expression:
length long_text $32767 ;
long_text = catx('~', of col:);
found = prxmatch('/(abc|def|ghi)/oi', long_text);
Obviously, this requires to rework your list of codes to search for, but I think this pretty efficient until you don't want to identify the variable(s) that contain(s) the code!
Upvotes: 2
Reputation: 51621
It really depends on what you mean by efficient.
In general you will have to test all of the CODEx variables to be sure that none of them contain a code of interest. So your proposed long expression might be the most efficient it terms of performance.
You could use a code generator (such as a SAS macro) to help you generate the repetitive code. That would be more efficient for the programmer even if it has no impact on the actual execution time.
You could use an ARRAY to allow you to loop of the set of CODEx variables. That does offer an opportunity to stop once at least one is found which might be more efficient.
array code[12];
do index=1 to dim(code) until(found);
found= code[index] in %mycodes ;
end;
if found;
But there is extra work required to implement the looping so for the cases that do not match it might actually take longer. Plus you could not use that in a WHERE statement.
Depending on the codes it might be better to loop over the codes in %MYCODES and test if they appear in any of the CODEx variables instead. You will need to know how many codes in in %MYCODES for this to work.
array code[12] ;
array mycodes [3] $3 _temporary_ %mycodes;
do index=1 to dim(mycodes) until(found);
found = mycodes[index] in code ;
end;
if found;
Upvotes: 1
Reputation: 12909
You could put your columns that you want to select within a macro variable and loop over them all.
%let cols = col1 col2 col3 col4;
%let myCodes = ('ABC','DEF','GHI');
%macro condition;
%scan(&cols, 1) IN &myCodes
%do i = 2 %to %sysfunc(countw(&cols));
OR %scan(&cols, &i) IN &myCodes
%end;
%mend;
data want;
set have;
where %condition;
run;
If you look at %condition
, it has all of your filters:
%put %condition;
col1 IN ('ABC','DEF','GHI') OR col2 IN ('ABC','DEF','GHI') OR col3 IN ('ABC','DEF','GHI') OR col4 IN ('ABC','DEF','GHI')
Upvotes: 1