Reputation: 6920
I'm trying to match the following cases except case 6 and case 8 :
case 1 - deliverto should match
case 2 - deliveryto : should match
case 3 - deliveryto: should match
case 4 - delivery to : should match
case 5 - delivery address : should match
case 6 - delivery order : should NOT match
case 7 - ship to: should match
case 8 - delivery inst : should NOT match
case 9 - delivery should match
case 10 - remit to : should match
case 11 - send to: should match
case 12 - remitto: should match
case 13 - delivery: should match
case 14 - deliver: should match
case 15 - delv. : should match
My logic is : Match 1st chunk [ship
or send
or remit
ordeliver
or delivery
or delv.
(dot is optional)] word if 2nd chunk [to
or address
] is found after that or even 2nd chunk is not found but don't take 1st chunk [ship
or ...] if you find 3rd chunk [order
or inst
] after 1st chunk.
I've used a negative look ahead for 3rd chunk followed by an optional positive look ahead for 2nd chunk. Here is the regex I have been trying :
pattern = r"(send|remit|ship|delivery|deliver|delv\.?)\s?(?!(Order|inst))(?=(to|address)?)\:?"
First problem I'm facing is : the regex matches even when the 1st chunk is followed by 3rd chunk.
Second problem is : if the possible cases are in a list and I try re.finditer()
on them, the optional 2nd chunk is not being matched :
l = ['case 1 - deliverto', 'case 2 - deliveryto :', 'case 3 - deliveryto: ', 'case 4 - delivery to :', 'case 5 - delivery address :', 'case 6 - delivery order :', 'case 7 - ship to:', 'case 8 - delivery inst :', 'case 9 - delivery ', 'case 10 - remit to :', 'case 11 - send to:', 'case 12 - remitto:', 'case 13 - delivery: ', 'case 14 - deliver: ', 'case 15 - delv. :']
for i in l:
print([i.group() for i in re.finditer(patern, i, re.IGNORECASE)])
gives :
['deliver']
['delivery']
['delivery']
['delivery ']
['delivery ']
['delivery']
['ship ']
['delivery']
['delivery ']
['remit ']
['send ']
['remit']
['delivery:']
['deliver:']
['delv. :']
I need to match with the optional to
or address
chunk if found. What am I doing wrong in the regex?
For implemented details, have a look at this regex101 site. Thanks.
Upvotes: 1
Views: 107
Reputation: 626748
You need to fail the regex match after you find the first word:
(?i)\b(?!\S+\s+(?:order|inst))(?:send|remit|ship|delivery?|delv\.?)(?:\s*(?:to|address))?\s?:?
See the regex demo
Details:
(?i)
- case insensitive matching ON (same as re.I
)\b
- word boundary(?!\S+\s+(?:order|inst))
- fail the match if 1+ non-whitespace chars, 1+ whitespace chars and then an order
or inst
appears immediately on the right(?:send|remit|ship|delivery?|delv\.?)
- send
, remit
, ship
, deliver
or delivery,
delvor
delv.`(?:\s*(?:to|address))?
- an optional sequence of 0+ whitespaces and then to
or address
\s?
- an optional whitespace:?
- an optional colon.Upvotes: 2