Reputation: 23
I am having some difficulty writing a regex expression that finds words in a text that contain 'zz', but not at the start and the end of the text. These are two of my many attempts:
pattern = re.compile(r'(?!(?:z){2})[a-z]*zz[a-z]*(?!(?:z){2})')
pattern = re.compile(r'\b[^z\s\d_]{2}[a-z]*zz[a-y][a-z]*(?!(?:zz))\b')
Thanks
Upvotes: 1
Views: 244
Reputation: 89557
You can use lookarounds:
\b(?!zz)\w+?zz\w+\b(?<!zz)
or not:
\bz?[^\Wz]\w*?zz\w*[^\Wz]z?\b
Limited to ASCII letters this last pattern can also be written:
\bz?[a-y][a-z]*?zz[a-z]*[a-y]z?\b
Upvotes: 2
Reputation: 18515
Another idea to use non word boundaries.
\B
matches at any position between two word characters as well as at any position between two non-word characters ...
\w*\Bzz\B\w*
Be aware that above matches words with two or more z
. For exactly two:
\w*(?<=[^\Wz])zz(?=[^\Wz])\w*
Use any of those patterns with (?i)
flag for caseless matching if needed.
Upvotes: 3
Reputation: 43169
Well, the direct translation would be
\b(?!zz)(?:(?!zz\b)\w)+zz(?:(?!zz\b)\w)+\b
Programmatically, you could use
text = "lorem ipsum buzz mezzo mix zztop but this is all"
words = [word
for word in text.split()
if not (word.startswith("zz") or word.endswith("zz")) and "zz" in word]
print(words)
Which yields
['mezzo']
See a demo on ideone.com.
Upvotes: 3
Reputation: 10536
Your criteria just means that the first and last letter cannot be z
. So we simply have to make sure the first and last letter is not z
, and then we have a zz
somewhere in the text.
Something like
^[^z].*zz.*[^z]$
should work
Upvotes: 0
Reputation: 18426
You can use negative lookahead and negative lookbehind assertions in the regex.
>>> import re
>>> text = 'ggksjdfkljggksldjflksddjgkjgg'
>>> re.findall('(?<!^)g{2}(?!$)', text)
['gg']
Upvotes: 0