Reputation: 19
While trying to remove all repeating words in a string in an example below, what should be the correct syntax to check for 1 or more repetition of the word. The following example returns
cat cat in the hat hat hat
it ignores more than one repetition in the string, only removes "in" & "the" which have been repeated only once.
>>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat cat cat in in the the hat hat hat hat hat hat')
Upvotes: 0
Views: 161
Reputation: 2406
A non regex alternative when order isn't important would be
" ".join(set(string_with_duplicates.split()))
This first splits the string by whitespace, turns the returned list into a set (which removes duplicates, as each element is unique), and then joins these items back into a string.
>>> string_with_duplicates = 'cat cat cat in in the the hat hat hat hat hat hat'
>>> " ".join(set(string_with_duplicates.split()))
'the in hat cat'
If the order of the words needs to be preserved, you could write something like this
>>> unique = []
>>> for w in string_of_duplicates.split():
if not w in unique:
unique.append(w)
>>> " ".join(unique)
'cat in the hat'
Upvotes: 0
Reputation: 4138
This should print the given sentence with duplicates
check_for_repeats = 'cat cat cat in in the the hat hat hat hat hat hat'
words = check_for_repeats.split()
sentence_array = []
for i in enumerate(words[:-1]):
if i[1] != words[i[0] + 1]:
sentence_array.append(i[1])
if words[-1:] != words[-2:]:
sentence_array.append(words[-1:][0])
sentence = ' '.join(sentence_array)
print(sentence)
Upvotes: 1
Reputation: 89574
You can use this:
re.sub(r'(\b[a-z]+) (?=\1\b)', '', 'cat cat cat in in the the hat hat hat hat hat hat')
Upvotes: 0
Reputation: 781721
Try this:
re.sub(r'(\b[a-z]+)(?: \1)+', r'\1', 'cat cat cat in in the the hat hat hat hat hat hat')
The repetition operator after the back-reference will make it match multiple repetitions.
Upvotes: 0