Sina
Sina

Reputation: 29

regex to find words in a sentence with no repeated consecutive characters

I am looking for a regex pattern to filter out words in a sentence with no repeated consecutive characters.

I have tried r'(?!.*(\w)\1{3,}).+' as the regex pattern but it doesn't work.

for instance, in the sentence 'mike is amaaazing', I want the regex pattern to pick up 'mike' and 'is' only.

Any ideas?

Upvotes: 1

Views: 483

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You have to use a word-boundary at the beginning and replace the dot with \w to be sure your lookahead doesn't go out of the tested word.

>>> s = 'mike is amaaazing'
>>> [m[1] for m in re.findall(r'\b(?!\w*?(\w)\1)(\w+)', s)]
['mike', 'is']

Since re.findall returns only capture groups when defined in the pattern, you can use a list comprehension to extract the second capture group (in which is the whole word).

Upvotes: 3

Code Maniac
Code Maniac

Reputation: 37755

You can try something like this

\b(?:(\w)(?!\1))+\b

enter image description here

Regex Demo

Upvotes: 2

Related Questions