ray smith
ray smith

Reputation: 380

Create a python regular expression regex that will find all consonants in each word within a string that are not repeated one after another

For example, If the word 'Happy' is given, I only want 'H' and 'y'.

If 'accomplished' is given, I only want 'm','p','l','s','h','d.

I know that (\w)\2 will find repeated characters, and (?i)

[b-df-hj-np-tv-z] will find all consonants, but how do I combine them?

Upvotes: 0

Views: 4578

Answers (4)

user1902824
user1902824

Reputation:

Brute force (super slow) solution:

import re

expr = '(?<!b)b(?!b)|(?<!c)c(?!c)|(?<!d)d(?!d)|(?<!f)f(?!f)|(?<!g)g(?!g)|(?<!h)h(?!h)|(?<!j)j(?!j)|(?<!k)k(?!k)|(?<!l)l(?!l)|(?<!m)m(?!m)|(?<!n)n(?!n)|(?<!p)p(?!p)|(?<!q)q(?!q)|(?<!r)r(?!r)|(?<!s)s(?!s)|(?<!t)t(?!t)|(?<!v)v(?!v)|(?<!w)w(?!w)|(?<!x)x(?!x)|(?<!y)y(?!y)|(?<!z)z(?!z)'

print re.findall(expr, 'happy')
print re.findall(expr, 'accomplished')
print re.findall(expr, 'happy accomplished')
print re.findall(expr, 'happy accccccompliiiiiiishedd')

# Readable form of expr
# (?<!b)b(?!b)|
# (?<!c)c(?!c)|
# (?<!d)d(?!d)|
# (?<!f)f(?!f)|
# (?<!g)g(?!g)|
# (?<!h)h(?!h)|
# (?<!j)j(?!j)|
# (?<!k)k(?!k)|
# (?<!l)l(?!l)|
# (?<!m)m(?!m)|
# (?<!n)n(?!n)|
# (?<!p)p(?!p)|
# (?<!q)q(?!q)|
# (?<!r)r(?!r)|
# (?<!s)s(?!s)|
# (?<!t)t(?!t)|
# (?<!v)v(?!v)|
# (?<!w)w(?!w)|
# (?<!x)x(?!x)|
# (?<!y)y(?!y)|
# (?<!z)z(?!z)

Output:

['h', 'y']
['m', 'p', 'l', 's', 'h', 'd']
['h', 'y', 'm', 'p', 'l', 's', 'h', 'd']
['h', 'y', 'm', 'p', 'l', 's', 'h']

Upvotes: 0

Evpok
Evpok

Reputation: 4485

You can use

(?=[b-df-hj-np-tv-xz])(.)(?!\1)(?<!\1\1)

which unfolds as

(?=[b-df-hj-np-tv-xz]) # Match only if the next character is a consonant
(.)                    # Match the consonant and capture it for subsequent usage
(?!\1)                 # Don't match if the next character if the same as the one we captured (avoid matching all but the last characters of a cluster)
(?<!\1\1)              # Don't match if the penultimate character was the same as the one we captured (to avoid matching the last character of a cluster)

but sadly that last line is not allowed in re, as lookbehinds must have fixed length. But the regex module¹ supports it

In [1]: import regex
In [2]: s=r'(?=[b-df-hj-np-tv-xz])(.)(?!\1)(?<!\1\1)'

In [3]: regex.findall(s, 'happy')
Out[3]: ['h']

In [4]: regex.findall(s, 'accomplished')
Out[4]: ['m', 'p', 'l', 's', 'h', 'd']

¹ “intended eventually to replace Python’s current re module implementation” according to the cheeseshop description.

Upvotes: 4

anubhava
anubhava

Reputation: 785481

Here is a regex that can be used:

([^aeiou])\1+|([^aeiou\s])

You can then grab captured group #2

RegEx Demo

Explanation:

[^aeiou]      # matches a consonant
([^aeiou])    # puts a consonant in captured group #1
([^aeiou])\1+ # matches repetitions of group #1
|             # regex alternation (OR)
([^aeiou\s])  # matches a consonant and grabs it in captured group #2

Code:

>>> for m in re.finditer(r'([^aeiou])\1+|([^aeiou\s])', "accomplished"):
...     print m.group(2)
...
None
m
p
l
s
h
d

Upvotes: 0

Katpoes
Katpoes

Reputation: 209

from re import findall
string = "Happy you!"
res    = []
for c in findall('[^aeiou]', string): 
    if c not in res:
        res.append(c)   

Filtering out duplicates and making use of the by your required 're' module.

Upvotes: 0

Related Questions