Reputation: 13
How can I replace repeating words in a string, with just one copy?
For example:
hi hi hello hello hello bye bye bye bye
should become:
hi hello bye
My code :
import re
s = "hi hi hello hello hello bye bye bye bye"
m=re.sub(r'(?<!\S)((\S+)(?:\s+\2))(?:\s+\2)+(?!\S)', r'\2', s)
print m
output:
hi hi hello bye
Upvotes: 0
Views: 54
Reputation: 104712
You can use:
re.sub(r'\b(\S+)(?: \1)+\b', r'\1', s)
The \b
escape is a zero-width match for a word break (either whitespace or the start or end of the text). Using it lets the rest of the pattern work without stuff like goodbye bye
or foo foobar
getting trimmed incorrectly.
The inner part of the pattern matches a word followed by one or more repeats of the same word separated by spaces. The whole thing is replaced by one copy of the word.
Upvotes: 1