jackhab
jackhab

Reputation: 17708

Remove duplicate words using regex in Python

I need to remove repetitive words in string so that 'the (the)' will become 'the'. Why can't I do it as follows?

re.sub('(.+) \(\1\)', '\1', 'the (the)')

Thanks.

Upvotes: 4

Views: 2048

Answers (2)

jensgram
jensgram

Reputation: 31518

You need to doubly escape the back-reference:

re.sub('(.+) \(\\1\)', '\\1', 'the (the)')
--> the

Or use the r prefix:

When an "r" or "R" prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string.

re.sub(r'(.+) \(\1\)', r'\1', 'the (the)')
--> the

Upvotes: 6

eat
eat

Reputation: 7530

According to documentation: 'Raw string notation (r"text") keeps regular expressions sane.'

Upvotes: 2

Related Questions