TrentWoodbury
TrentWoodbury

Reputation: 911

Regex: how to remove redundant substring

I have a string. There is redundant text at the end of this string. I want to remove all of that redundant text (both the first and second instance of the redundant text). How can I find all the repeated text at the end of a string and remove it?

In my example, I am working with a string that also has a prefix that I'm removing. So for example, I want: prefix a b c d e 123 d e 123 to return a b c

The duplicate substring can vary in length. So I would want: prefix a b c 123 c 123 to return a b

I tried matching this with

import re
re.sub(
    r'prefix ([a-z ]*)\2([a-z ]* \d*)$',
    r'\1',
    'prefix a b c 123 c 123'
)

but of course this led to a forwards reference error since I'm referring to the contents of \2 before I've created it.

I'm doing this regex in Python. 3.7.

Upvotes: 2

Views: 110

Answers (2)

The fourth bird
The fourth bird

Reputation: 163632

In your pattern, you can put the \2 after the second group, before the end of the string.

In the replacement use group 1.

prefix ([a-z ]*)([a-z ]* \d*)\2$

Regex demo

import re
result = re.sub(
    r'prefix ([a-z ]*)([a-z ]* \d*)\2$',
    r'\1',
    'prefix a b c 123 c 123'
)
print(result)

Output

a b

Upvotes: 2

anubhava
anubhava

Reputation: 786291

You may use this regex for search:

^prefix\s+(.*?)(.+?)\2+$

and use: r'\1' for replacement.

RegEx Demo

Python Code:

import re

r = re.sub(
    r'^prefix\s+(.*?)(.+?)\2+$',
    r'\1',
    'prefix a b c 123 c 123'
)
print (r)

Code Demo

RegEx Details:

  • ^: Start
  • prefix\s+: Match text prefix followed by 1+ whitespaces
  • (.*?): Match 0 or more of any characters in capture group #1
  • (.+?); Match 1 or more of any characters in capture group #2
  • \2+: Match 1 or more repetitions of group #2
  • $: End

Upvotes: 3

Related Questions