python regex repeated pattern

Question

I am looking for a regular expression to match:

[document n] m

in order to get rid of [document n] only when n=m

where n is any number

So [document 34] 34 will be a match but [document 34] 45 would not because the numbers are different

So far I have this:

import re
text = "[document 23] 23 and [document 34] 48 are white"
text = re.sub(r"($$document \d+$$ )(\d+)",r"\2. ",text)

But this does not assure thar the the numbers are equal.

Any idea?

Wiktor Stribiżew · Accepted Answer

You can use

\[document\s+(\d+)]\s+\1(?!\d)

See the regex demo. Replace with \1. Details:

\[document - [document string
\s+ - one or more whitespaces
(\d+) - Group 1 (\1): one or more digits
] - a ] char
\s+ - one or more whitespaces
\1 - backreference to Group 1
(?!\d) - a negative lookahead that fails the match if there is a digit immediately to the right of the current location.

See the Python demo:

import re
text = "[document 23] 23 and [document 34] 48 are white [document 24] 240 text"
print( re.sub(r'\[document\s+(\d+)]\s+\1(?!\d)', r'\1', text) )
## => 23 and [document 34] 48 are white [document 24] 240 text

python regex repeated pattern

Answers (1)

Related Questions