Reputation: 155
Consider a sentence which will have some words which may or may not start or end with 'z'.
This was my code :
reg_9 = re.compile(r'\b[^z]\w+z\w+[^z]\b')
sentence = "this sentence contains zatstart azb pole ab noaz yeszishere z_is_op"
reg_9.findall(sentence)
So according to above regex all strings within boundary '\b', which does not start with 'z' and does not ends with 'z' (that [^z] at start and end) but having 'z' somewhere in between which is given by '\w+z\w+' in my regex.
In the output I am getting this :
[' azb ', ' yeszishere ']
So can someone tell why this output strings consists of those extra spaces at start and end ?
Upvotes: 1
Views: 800
Reputation: 626927
The pattern for this task can look like
\b(?!DOES_NOT_START_WITH)(?=\w*?MUST_CONTAIN)\w+\b(?<!DOES_NOT_END_WITH)
You can use
import re
reg_9 = re.compile(r'\b(?!z)(?=\w*?z)\w+\b(?<!z)')
sentence = "this sentence contains zatstart azb pole ab noaz yeszishere z_is_op"
print(reg_9.findall(sentence))
# => ['azb', 'yeszishere']
See the regex demo and the Python demo.
Details:
\b
- word boundary(?!z)
- immediately on the right, there should be no z
(?=\w*?z)
- a positive lookahead that requires a z
after any zero or more word chars\w+
- `one or more word chars\b
- a word boundary(?<!z)
- a negative lookbehind, immediately on the left, there should be no z
.Upvotes: 1
Reputation: 521457
You need to make the \w+
optional, i.e. use \w*
instead. But, I would phrase your regex as:
reg_9 = re.compile(r'\b[^\WzZ]\w*z\w*[^\WzZ]\b')
sentence = "this sentence contains zatstart azb pole ab noaz yeszishere z_is_op"
print(reg_9.findall(sentence)) # ['azb', 'yeszishere']
This regex pattern says to:
\b match a word boundary
[^\WzZ] match any word character OTHER than z or Z
\w* zero or more word characters
z z
\w* zero or more word characters
[^\WzZ] match any word character OTHER than z or Z
\b match a word boundary
Upvotes: 1