Reputation: 911
I have a string. There is redundant text at the end of this string. I want to remove all of that redundant text (both the first and second instance of the redundant text). How can I find all the repeated text at the end of a string and remove it?
In my example, I am working with a string that also has a prefix that I'm removing. So for example, I want: prefix a b c d e 123 d e 123
to return a b c
The duplicate substring can vary in length. So I would want: prefix a b c 123 c 123
to return a b
I tried matching this with
import re
re.sub(
r'prefix ([a-z ]*)\2([a-z ]* \d*)$',
r'\1',
'prefix a b c 123 c 123'
)
but of course this led to a forwards reference error since I'm referring to the contents of \2 before I've created it.
I'm doing this regex in Python. 3.7.
Upvotes: 2
Views: 110
Reputation: 163632
In your pattern, you can put the \2
after the second group, before the end of the string.
In the replacement use group 1.
prefix ([a-z ]*)([a-z ]* \d*)\2$
import re
result = re.sub(
r'prefix ([a-z ]*)([a-z ]* \d*)\2$',
r'\1',
'prefix a b c 123 c 123'
)
print(result)
Output
a b
Upvotes: 2
Reputation: 786291
You may use this regex for search:
^prefix\s+(.*?)(.+?)\2+$
and use: r'\1'
for replacement.
Python Code:
import re
r = re.sub(
r'^prefix\s+(.*?)(.+?)\2+$',
r'\1',
'prefix a b c 123 c 123'
)
print (r)
RegEx Details:
^
: Startprefix\s+
: Match text prefix
followed by 1+ whitespaces(.*?)
: Match 0 or more of any characters in capture group #1(.+?)
; Match 1 or more of any characters in capture group #2\2+
: Match 1 or more repetitions of group #2$
: EndUpvotes: 3