PATHIK GHUGARE
PATHIK GHUGARE

Reputation: 155

Regex in python that matches a word containing 'z', not at the start or end of the word

Consider a sentence which will have some words which may or may not start or end with 'z'.

This was my code :

reg_9 = re.compile(r'\b[^z]\w+z\w+[^z]\b')
sentence = "this sentence contains zatstart azb pole ab noaz yeszishere z_is_op"
reg_9.findall(sentence)

So according to above regex all strings within boundary '\b', which does not start with 'z' and does not ends with 'z' (that [^z] at start and end) but having 'z' somewhere in between which is given by '\w+z\w+' in my regex.

In the output I am getting this :

[' azb ', ' yeszishere ']

So can someone tell why this output strings consists of those extra spaces at start and end ?

Upvotes: 1

Views: 800

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626927

The pattern for this task can look like

\b(?!DOES_NOT_START_WITH)(?=\w*?MUST_CONTAIN)\w+\b(?<!DOES_NOT_END_WITH)

You can use

import re
reg_9 = re.compile(r'\b(?!z)(?=\w*?z)\w+\b(?<!z)')
sentence = "this sentence contains zatstart azb pole ab noaz yeszishere z_is_op"
print(reg_9.findall(sentence))
# => ['azb', 'yeszishere']

See the regex demo and the Python demo.

Details:

  • \b - word boundary
  • (?!z) - immediately on the right, there should be no z
  • (?=\w*?z) - a positive lookahead that requires a z after any zero or more word chars
  • \w+ - `one or more word chars
  • \b - a word boundary
  • (?<!z) - a negative lookbehind, immediately on the left, there should be no z.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521457

You need to make the \w+ optional, i.e. use \w* instead. But, I would phrase your regex as:

reg_9 = re.compile(r'\b[^\WzZ]\w*z\w*[^\WzZ]\b')
sentence = "this sentence contains zatstart azb pole ab noaz yeszishere z_is_op"
print(reg_9.findall(sentence))  # ['azb', 'yeszishere']

This regex pattern says to:

\b       match a word boundary
[^\WzZ]  match any word character OTHER than z or Z
\w*      zero or more word characters
z        z
\w*      zero or more word characters
[^\WzZ]  match any word character OTHER than z or Z
\b       match a word boundary

Upvotes: 1

Related Questions