Sanoop
Sanoop

Reputation: 13

regex for repeating words in a string to repalce one in Python

How can I replace repeating words in a string, with just one copy?

For example:

hi hi hello hello hello bye bye bye bye 

should become:

hi hello bye 

My code :

import re
s = "hi hi hello hello hello bye bye bye bye"
m=re.sub(r'(?<!\S)((\S+)(?:\s+\2))(?:\s+\2)+(?!\S)', r'\2', s)
print m

output:

hi hi hello bye

Upvotes: 0

Views: 54

Answers (1)

Blckknght
Blckknght

Reputation: 104712

You can use:

re.sub(r'\b(\S+)(?: \1)+\b', r'\1', s)

The \b escape is a zero-width match for a word break (either whitespace or the start or end of the text). Using it lets the rest of the pattern work without stuff like goodbye bye or foo foobar getting trimmed incorrectly.

The inner part of the pattern matches a word followed by one or more repeats of the same word separated by spaces. The whole thing is replaced by one copy of the word.

Upvotes: 1

Related Questions