Reputation: 29
I am looking for a regex pattern to filter out words in a sentence with no repeated consecutive characters.
I have tried r'(?!.*(\w)\1{3,}).+'
as the regex pattern but it doesn't work.
for instance, in the sentence 'mike is amaaazing', I want the regex pattern to pick up 'mike' and 'is' only.
Any ideas?
Upvotes: 1
Views: 483
Reputation: 89557
You have to use a word-boundary at the beginning and replace the dot with \w
to be sure your lookahead doesn't go out of the tested word.
>>> s = 'mike is amaaazing'
>>> [m[1] for m in re.findall(r'\b(?!\w*?(\w)\1)(\w+)', s)]
['mike', 'is']
Since re.findall
returns only capture groups when defined in the pattern, you can use a list comprehension to extract the second capture group (in which is the whole word).
Upvotes: 3