Reputation: 489
I have the following situation.
Strings have a number of patterns. I need to find and select one. I have a PRXPARSE that will find any one of the given patterns. It returns the first matched pattern. I would like to be able to set priority. I.E. Scan the entire string for pattern 1, if found stop and return, else scan the string for pattern 2 and etc.
This is the current code:
data have;
infile datalines delimiter=':' truncover;
informat target $50.;
input target;
datalines;
aaa
bbb
aaa bbb
bbb aaa
ccc aaa
ccc bbb
bbb ccc
;
run;
data want;
set have;
RE = PRXPARSE("/aaa|bbb/");
CALL prxsubstr(RE,STRIP(target),start,length);
IF START GT 0 then DO;
TEST_STR = substrn(STRIP(target),start,length);
end;
ELSE DO;
TEST_STR = STRIP(target);
end;
drop re start length;
run;
And this is the current output: target TEST_STR
aaa | aaa
bbb | bbb
aaa bbb | aaa
bbb aaa | bbb
ccc aaa | aaa
ccc bbb | bbb
bbb ccc | bbb
lets say I need pattern bbb to have priority, so that third line 'aaa bbb' returns 'bbb' rather than 'aaa'
Short of have multiple PRXPARSE statement and only calling the next if the previous has failed, how can I get it to work ?
Thank you Ben
Upvotes: 0
Views: 315
Reputation: 27516
For clarity of what is being done I would keep prioritized pattern matching steps separated.
The regular expression alternation operator |
is essentially an or operator for matching criteria. In regex pattern matching the default target string scan is performed left to right. These two facts mean the default behavior will return the position of the first matched criteria, and thus your question.
As for prioritized match attempts, I would recommend using the select
statement
data want;
set have;
rx1 = prxparse ("/\b(bbb)\b/"); /* bbb as a word (\b) is first priority */
rx2 = prxparse ("/\b(aaa)\b/"); /* aaa as a word (\b) is second priority */
select;
when (prxmatch(rx1, target)) matched_with = prxposn(rx1,1,target);
when (prxmatch(rx2, target)) matched_with = prxposn(rx2,1,target);
otherwise matched_with = '*no match*';
end;
drop rx:;
run;
It might be possible, using a single complicated pattern, to achieve prioritized locating, but the benefit of coding it so is less important than maintaining a clear and understandable code base. The complicated pattern would probably involve a reverse look ahead expression.
Upvotes: 2