user3776749
user3776749

Reputation: 697

Python Regular Expressions: making multiple different substitutions in a single pass using Groups

I'm tasked with taking a string, finding all instances of two different types of matches in that string, and performing a similar-but-different replacement on each match of each type, all using a single RegEx and a single pass through re.sub()

Specifically I'm looking for any < or <= and replacing them with > and >= respectively. Each comparison operator in need of replacement is between two words as defined by \w* and zero or more spaces \s* on either side.

I have found a regular expression that finds all necessary matches and lumps them into useful groups:

((\b\w*(\s*<\s*)\w*\b)|(\b\w*(\s*<=\s*)\w*\b))+

This will parse the string such that all comparisons that meet the search criteria are matched, and that all < will be in match group \3 and all <= will be in match group \5

My question is this: Is there a way to replace all \3 with ' > ' and all \5 with ' >= ' in a single call to re.sub()? I've read through the documentation for the sub method in python re but haven't been able to find a way, perhaps due to my limited familiarity with the syntax and behavior of the whole system.

I am allowed and expected to compile the regex separately before the substitution and so the final set up will look something like this:

r1 = re.compile(r"((\b\w*(\s*<\s*)\w*\b)|(\b\w*(\s*<=\s*)\w*\b))+")
subStr = r" ??? " 

r1.sub( ???, subStr ??? )

Here is some example input/output:

input string :

"v1 < v2 v3 <= v4 v5 > v6 v7 >= v8"

running the substitution would produce:

"v1 > v2 v3 >= v4 v5 > v6 v7 >= v8"

plugging my pattern and the input string into https://regex101.com/ for python, will show how my pattern matches the input string in the way I described.

Upvotes: 1

Views: 492

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You only have to make the = optional and to capture parts around the <:

re.sub(r'\b(?<=\w)(\s*)<(=?\s*\w)', r'\1>\2', s)

for efficiency reasons I started the pattern with the word boundary \b, the following lookbehind (?<=\w) ensures there's at least one word character.

Upvotes: 3

Related Questions