JFerro
JFerro

Reputation: 3433

python regex repeated pattern

I am looking for a regular expression to match:

[document n] m

in order to get rid of [document n] only when n=m

where n is any number

So [document 34] 34 will be a match but [document 34] 45 would not because the numbers are different

So far I have this:

import re
text = "[document 23] 23 and [document 34] 48 are white"
text = re.sub(r"(\[document \d+\] )(\d+)",r"\2. ",text)

But this does not assure thar the the numbers are equal.

Any idea?

Upvotes: 3

Views: 159

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

You can use

\[document\s+(\d+)]\s+\1(?!\d)

See the regex demo. Replace with \1. Details:

  • \[document - [document string
  • \s+ - one or more whitespaces
  • (\d+) - Group 1 (\1): one or more digits
  • ] - a ] char
  • \s+ - one or more whitespaces
  • \1 - backreference to Group 1
  • (?!\d) - a negative lookahead that fails the match if there is a digit immediately to the right of the current location.

See the Python demo:

import re
text = "[document 23] 23 and [document 34] 48 are white [document 24] 240 text"
print( re.sub(r'\[document\s+(\d+)]\s+\1(?!\d)', r'\1', text) )
## => 23 and [document 34] 48 are white [document 24] 240 text

Upvotes: 5

Related Questions