Reputation: 17
I would like to remove duplicated expressions from a given string using SAS code. Each expression is delimited by a space and respects the following REGEX /[A-Z]_\d{2}.\d{2}(.[a-z])?/.
Here is the code:
data want;
text = "X_99.99.a X_99.99.a A_12.00 A_12.00 A_13.00 A_12.00 X_99.99.a";
do i=1 to countw(text);
Nondups=prxchange('s/\b(\w+)\s\1/$1/',-1,compbl(text));
end;
run;
The desired result should be: Nondups ="X_99.99.a A_12.00 A_13.00"
What should be the regular expression to be used inside the function prxchange?
Any help appreciated.
Upvotes: 1
Views: 225
Reputation: 627536
You may use
Nondups=trim(prxchange('s/\s*([A-Z]_\d{2}\.\d{2}(?:\.[a-z])?)(?=.*\1)//',-1, text));
See the regex demo
The pattern matches:
\s*
- 0+ whitespaces([A-Z]_\d{2}\.\d{2}(?:\.[a-z])?)
- Group 1:
[A-Z]
- an uppercase ASCII letter_
- an underscore\d{2}
- two digits\.
- a dot (must be escaped)\d{2}
- two digits (?:\.[a-z])?
- an optional group matching 1 or 0 sequences of a .
and a lowercase ASCII letter(?=.*\1)
- a positive lookahead that requires any 0+ chars other than line break chars, as many as possible, up to the value stored in Group 1 immediately to the right of the current location.Upvotes: 1