Reputation: 380
For example, If the word 'Happy' is given, I only want 'H' and 'y'.
If 'accomplished' is given, I only want 'm','p','l','s','h','d.
I know that (\w)\2 will find repeated characters, and (?i)
[b-df-hj-np-tv-z] will find all consonants, but how do I combine them?
Upvotes: 0
Views: 4578
Reputation:
Brute force (super slow) solution:
import re
expr = '(?<!b)b(?!b)|(?<!c)c(?!c)|(?<!d)d(?!d)|(?<!f)f(?!f)|(?<!g)g(?!g)|(?<!h)h(?!h)|(?<!j)j(?!j)|(?<!k)k(?!k)|(?<!l)l(?!l)|(?<!m)m(?!m)|(?<!n)n(?!n)|(?<!p)p(?!p)|(?<!q)q(?!q)|(?<!r)r(?!r)|(?<!s)s(?!s)|(?<!t)t(?!t)|(?<!v)v(?!v)|(?<!w)w(?!w)|(?<!x)x(?!x)|(?<!y)y(?!y)|(?<!z)z(?!z)'
print re.findall(expr, 'happy')
print re.findall(expr, 'accomplished')
print re.findall(expr, 'happy accomplished')
print re.findall(expr, 'happy accccccompliiiiiiishedd')
# Readable form of expr
# (?<!b)b(?!b)|
# (?<!c)c(?!c)|
# (?<!d)d(?!d)|
# (?<!f)f(?!f)|
# (?<!g)g(?!g)|
# (?<!h)h(?!h)|
# (?<!j)j(?!j)|
# (?<!k)k(?!k)|
# (?<!l)l(?!l)|
# (?<!m)m(?!m)|
# (?<!n)n(?!n)|
# (?<!p)p(?!p)|
# (?<!q)q(?!q)|
# (?<!r)r(?!r)|
# (?<!s)s(?!s)|
# (?<!t)t(?!t)|
# (?<!v)v(?!v)|
# (?<!w)w(?!w)|
# (?<!x)x(?!x)|
# (?<!y)y(?!y)|
# (?<!z)z(?!z)
Output:
['h', 'y']
['m', 'p', 'l', 's', 'h', 'd']
['h', 'y', 'm', 'p', 'l', 's', 'h', 'd']
['h', 'y', 'm', 'p', 'l', 's', 'h']
Upvotes: 0
Reputation: 4485
You can use
(?=[b-df-hj-np-tv-xz])(.)(?!\1)(?<!\1\1)
which unfolds as
(?=[b-df-hj-np-tv-xz]) # Match only if the next character is a consonant
(.) # Match the consonant and capture it for subsequent usage
(?!\1) # Don't match if the next character if the same as the one we captured (avoid matching all but the last characters of a cluster)
(?<!\1\1) # Don't match if the penultimate character was the same as the one we captured (to avoid matching the last character of a cluster)
but sadly that last line is not allowed in re
, as lookbehinds must have fixed length. But the regex
module¹ supports it
In [1]: import regex
In [2]: s=r'(?=[b-df-hj-np-tv-xz])(.)(?!\1)(?<!\1\1)'
In [3]: regex.findall(s, 'happy')
Out[3]: ['h']
In [4]: regex.findall(s, 'accomplished')
Out[4]: ['m', 'p', 'l', 's', 'h', 'd']
¹ “intended eventually to replace Python’s current re module implementation” according to the cheeseshop description.
Upvotes: 4
Reputation: 785481
Here is a regex that can be used:
([^aeiou])\1+|([^aeiou\s])
You can then grab captured group #2
Explanation:
[^aeiou] # matches a consonant
([^aeiou]) # puts a consonant in captured group #1
([^aeiou])\1+ # matches repetitions of group #1
| # regex alternation (OR)
([^aeiou\s]) # matches a consonant and grabs it in captured group #2
Code:
>>> for m in re.finditer(r'([^aeiou])\1+|([^aeiou\s])', "accomplished"):
... print m.group(2)
...
None
m
p
l
s
h
d
Upvotes: 0
Reputation: 209
from re import findall
string = "Happy you!"
res = []
for c in findall('[^aeiou]', string):
if c not in res:
res.append(c)
Filtering out duplicates and making use of the by your required 're' module.
Upvotes: 0