shillos
shillos

Reputation: 23

Writing a regex expression that finds 'zz' in a word but not at the start and the end

I am having some difficulty writing a regex expression that finds words in a text that contain 'zz', but not at the start and the end of the text. These are two of my many attempts:

pattern = re.compile(r'(?!(?:z){2})[a-z]*zz[a-z]*(?!(?:z){2})')
pattern = re.compile(r'\b[^z\s\d_]{2}[a-z]*zz[a-y][a-z]*(?!(?:zz))\b')

Thanks

Upvotes: 1

Views: 244

Answers (5)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You can use lookarounds:

\b(?!zz)\w+?zz\w+\b(?<!zz)

demo

or not:

\bz?[^\Wz]\w*?zz\w*[^\Wz]z?\b

demo

Limited to ASCII letters this last pattern can also be written:

\bz?[a-y][a-z]*?zz[a-z]*[a-y]z?\b

Upvotes: 2

bobble bubble
bobble bubble

Reputation: 18515

Another idea to use non word boundaries.

\B matches at any position between two word characters as well as at any position between two non-word characters ...

\w*\Bzz\B\w*

See this demo at regex101


Be aware that above matches words with two or more z. For exactly two:

\w*(?<=[^\Wz])zz(?=[^\Wz])\w*

Another demo at regex101


Use any of those patterns with (?i) flag for caseless matching if needed.

Upvotes: 3

Jan
Jan

Reputation: 43169

Well, the direct translation would be

\b(?!zz)(?:(?!zz\b)\w)+zz(?:(?!zz\b)\w)+\b

See a demo on regex101.com.


Programmatically, you could use

text = "lorem ipsum buzz mezzo mix zztop but this is all"

words = [word 
         for word in text.split()
         if not (word.startswith("zz") or word.endswith("zz")) and "zz" in word]

print(words)

Which yields

['mezzo']

See a demo on ideone.com.

Upvotes: 3

SztupY
SztupY

Reputation: 10536

Your criteria just means that the first and last letter cannot be z. So we simply have to make sure the first and last letter is not z, and then we have a zz somewhere in the text.

Something like

^[^z].*zz.*[^z]$

should work

Upvotes: 0

ThePyGuy
ThePyGuy

Reputation: 18426

You can use negative lookahead and negative lookbehind assertions in the regex.

>>> import re
>>> text = 'ggksjdfkljggksldjflksddjgkjgg'
>>> re.findall('(?<!^)g{2}(?!$)', text)
 ['gg']

Upvotes: 0

Related Questions