David
David

Reputation: 17

SAS: Remove duplicated expressions from given list using REGEX

I would like to remove duplicated expressions from a given string using SAS code. Each expression is delimited by a space and respects the following REGEX /[A-Z]_\d{2}.\d{2}(.[a-z])?/.

Here is the code:

data want;
text = "X_99.99.a X_99.99.a A_12.00 A_12.00 A_13.00 A_12.00 X_99.99.a";
do i=1 to countw(text);
Nondups=prxchange('s/\b(\w+)\s\1/$1/',-1,compbl(text));
end;
run;

The desired result should be: Nondups ="X_99.99.a A_12.00 A_13.00"

What should be the regular expression to be used inside the function prxchange?

Any help appreciated.

Upvotes: 1

Views: 225

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627536

You may use

Nondups=trim(prxchange('s/\s*([A-Z]_\d{2}\.\d{2}(?:\.[a-z])?)(?=.*\1)//',-1, text));

See the regex demo

The pattern matches:

  • \s* - 0+ whitespaces
  • ([A-Z]_\d{2}\.\d{2}(?:\.[a-z])?) - Group 1:
    • [A-Z] - an uppercase ASCII letter
    • _ - an underscore
    • \d{2} - two digits
    • \. - a dot (must be escaped)
    • \d{2} - two digits
    • (?:\.[a-z])? - an optional group matching 1 or 0 sequences of a . and a lowercase ASCII letter
  • (?=.*\1) - a positive lookahead that requires any 0+ chars other than line break chars, as many as possible, up to the value stored in Group 1 immediately to the right of the current location.

Upvotes: 1

Related Questions