Regex in python that matches a word containing 'z', not at the start or end of the word

Question

Consider a sentence which will have some words which may or may not start or end with 'z'.

This was my code :

reg_9 = re.compile(r'\b[^z]\w+z\w+[^z]\b')
sentence = "this sentence contains zatstart azb pole ab noaz yeszishere z_is_op"
reg_9.findall(sentence)

So according to above regex all strings within boundary '\b', which does not start with 'z' and does not ends with 'z' (that [^z] at start and end) but having 'z' somewhere in between which is given by '\w+z\w+' in my regex.

In the output I am getting this :

[' azb ', ' yeszishere ']

So can someone tell why this output strings consists of those extra spaces at start and end ?

Tim Biegeleisen · Accepted Answer

You need to make the \w+ optional, i.e. use \w* instead. But, I would phrase your regex as:

reg_9 = re.compile(r'\b[^\WzZ]\w*z\w*[^\WzZ]\b')
sentence = "this sentence contains zatstart azb pole ab noaz yeszishere z_is_op"
print(reg_9.findall(sentence))  # ['azb', 'yeszishere']

This regex pattern says to:

\b       match a word boundary
[^\WzZ]  match any word character OTHER than z or Z
\w*      zero or more word characters
z        z
\w*      zero or more word characters
[^\WzZ]  match any word character OTHER than z or Z
\b       match a word boundary

Regex in python that matches a word containing 'z', not at the start or end of the word

Answers (2)

Related Questions

Regex in python that matches a word containing &#39;z&#39;, not at the start or end of the word

Answers (2)

Related Questions

Regex in python that matches a word containing 'z', not at the start or end of the word