Reputation: 3433
I am looking for a regular expression to match:
[document n] m
in order to get rid of [document n] only when n=m
where n is any number
So [document 34] 34 will be a match but [document 34] 45 would not because the numbers are different
So far I have this:
import re
text = "[document 23] 23 and [document 34] 48 are white"
text = re.sub(r"(\[document \d+\] )(\d+)",r"\2. ",text)
But this does not assure thar the the numbers are equal.
Any idea?
Upvotes: 3
Views: 159
Reputation: 626690
You can use
\[document\s+(\d+)]\s+\1(?!\d)
See the regex demo. Replace with \1
. Details:
\[document
- [document
string\s+
- one or more whitespaces(\d+)
- Group 1 (\1
): one or more digits]
- a ]
char\s+
- one or more whitespaces\1
- backreference to Group 1(?!\d)
- a negative lookahead that fails the match if there is a digit immediately to the right of the current location.See the Python demo:
import re
text = "[document 23] 23 and [document 34] 48 are white [document 24] 240 text"
print( re.sub(r'\[document\s+(\d+)]\s+\1(?!\d)', r'\1', text) )
## => 23 and [document 34] 48 are white [document 24] 240 text
Upvotes: 5