Reputation: 51
I'm new to regex, and I'm wanting to find all instances of "po" and it's variants (i.e. "p.o. | p. o. | p o") that ISN'T followed by "box" because I'm interested in purchase orders and not PO boxes. The code below isn't working and just matches the po even when it's followed by a "box." Any ideas?
string = " po pobox po box po box p.o. p.o.box p.o. box p.o. box"
re.findall(r' p\.?\s?o\.?(?!\s*box)', string)
//expected output
[' po', ' p.o.']
//actual output
[' po', ' p.o.', ' p.o', ' p.o', ' p.o']
Upvotes: 4
Views: 125
Reputation: 627292
You placed the lookahead after an optional pattern and backtracking makes it possible to match the string in another way.
If Python supported possessive quantifiers, it would be easy to solve by adding +
after the \.?
that is before the lookahead: p\.?\s?o\.?+(?!\s*box)
. It would prevent the engine from backtracking into \.?
pattern.
However, since Python re
does not support them, you need to move the lookahead right after the o
, obligatory part, and add \.?
to the lookahead:
r'p\.?\s?o(?!\.?\s*box)\.?'
^^^^^^^^^^^^^
See the regex demo. Add \b
after box
if you plan to match it as a whole word. Same with the first p
, you may want to add a \b
before it to match p
as a whole word.
Details
p
- a p
\.?
- an optional (1 or 0) dots\s?
- an optional (1 or 0) whitespaceso
- an o
(?!\.?\s*box)
- a negative lookahead that fails the match if, immediately to the right of the current location there is an optional dot, 0+ whitespaces and box
\.?
- an optional (1 or 0) dotsUpvotes: 3