jxie0755
jxie0755

Reputation: 1742

How to replace/modify a pattern by regular expression in python?

Assume that I want to modify all patterns in a script, take one line as an example:

line = "assert Solution().oddEvenList(genNode([2,1,3,5,6,4,7])) == genNode([2,3,6,7,1,5,4]), 'Example 2'"

Notice that function genNode is taking List[int] as the parameter. What I want is to remove the List, and keep the all the integers in the list, so that the function is actually taking *nums as the parameters.

Expecting:

line = "assert Solution().oddEvenList(genNode(2,1,3,5,6,4,7)) == genNode(2,3,6,7,1,5,4), 'Example 2'"

I've come up with a re pattern

r"([g][e][n][N][o][d][e][(])([[][0-9\,\s]*[]])([)])"

but I am not sure how I could use this... I can't get re.sub to work as it requires me to replace with a fixed string.

How can I achieve my desired result?

Upvotes: 5

Views: 1830

Answers (2)

The fourth bird
The fourth bird

Reputation: 163217

Instead of writing [g][e][n][N][o][d][e][(] you could write getNode\(

The current character class that you use [0-9\,\s]* matches 0+ times any of the listed which could also for example match only comma's and does not make sure that there are comma separated digits.

To match the comma delimiter integers, you could match 1+ digits with a repeating group to match a comma and 1+ digits.

At the end use a positive lookahead to assert for the closing parenthesis or capture it in group 3 and also use that in the replacement.

With this pattern use r'\1\2 as the replacement.

(genNode\()\[(\d+(?:,\d+)*)\](?=\))

Explanation

  • (genNode\() Capture in group 1 matching genNode(
  • \[ Match [
  • ( Capturing group 2
    • \d+(?:,\d+)* Match 1+ digits and repeat 0+ times a comma and 1+ digits (to also support a single digit)
  • ) Close group 2
  • \] Match ]
  • (?=\)) Positive lookahead, assert what is on the right is a closing parenthesis )

Python demo | Regex demo

For example

import re

regex = r"(genNode\()\[(\d+(?:,\d+)*)\](?=\))"
line = "assert Solution().oddEvenList(genNode([2,1,3,5,6,4,7])) == genNode([2,3,6,7,1,5,4]), 'Example 2'"
result = re.sub(regex, r"\1\2", line)

if result:
    print (result)

Result

assert Solution().oddEvenList(genNode(2,1,3,5,6,4,7)) == genNode(2,3,6,7,1,5,4), 'Example 2'

Upvotes: 1

heemayl
heemayl

Reputation: 41987

You can do:

re.sub(r'(genNode\()\[([^]]+)\]', r'\1\2', line)
  • (genNode\() matches genNode( and put it in captured group 1
  • \[ matches literal [
  • ([^]]+) matches upto next ], and put it in captured group 2
  • \] matches literal ]

In the replacement, we've used the captured groups only i.e. dropped [ and ].


You can get rid of the first captured group by using a zero-width positive lookbehind to match the portion before [:

re.sub(r'(?<=genNode\()\[([^]]+)\]', r'\1', line)

Example:

In [444]: line = "assert Solution().oddEvenList(genNode([2,1,3,5,6,4,7])) == genNode([2,3,6,7,1,5,4]), 'Example 2'"                                                                                         

In [445]: re.sub(r'(genNode\()\[([^]]+)\]', r'\1\2', line)                                                                                                                                                  
Out[445]: "assert Solution().oddEvenList(genNode(2,1,3,5,6,4,7)) == genNode(2,3,6,7,1,5,4), 'Example 2'"

In [446]: re.sub(r'(?<=genNode\()\[([^]]+)\]', r'\1', line)                                                                                                                                                 
Out[446]: "assert Solution().oddEvenList(genNode(2,1,3,5,6,4,7)) == genNode(2,3,6,7,1,5,4), 'Example 2'"

FWIW, using typical non-greedy pattern .*? instead of [^]]+ would work as well:

re.sub(r'(?<=genNode\()\[(.*?)\]', r'\1', line)

Upvotes: 2

Related Questions