Reputation: 1096
I am using regex pattern
[^A-Za-z](email,|help|BGN|won't|go|corner|issues|disconected|We|group|No|send|Bv|connecting|has|Pittsburgh,|Many|(Akustica,|Toluca|cannot|Restarting|they|not|PI2|one|condition|entire|LAN|experincing|bar|Exchange,|server|Are|PA)|OutLook|right|says|Rose|Montalvo|back|computer|are|Jane|thier|Disconnected|Nrd|and/or|network|for|Appears|e-mail|unable|Connected|then|Broadview,|issue|email|shows|available|be|we|exchange|error|address|based|My|Microsoft|received|working|created|receive|impacted|WIFI|through|connection|including|or|IL|outlook|via|facility|Everyone's|servers|Also|message|"The|your|Status|doesn't|service|SI-MBX82.de.bosch.com,|next|appears|"disconnected"|Encryption|eMail/file|today|"Waiting|"send/receive"|but|it|trying|SAP|disconnected|e-mails|this|getting|can|of|connect|Incorrect|manually|is|site|an|folder"|cant|Other|have|in|Receiving|if|Plant|no|SI-MBX80.de.bosch.com|that|when|online|persists."|Customer|administrator|users|update|applications|"Disconnected"|SI-MBX81.de.bosch.com|The|on|lower|Some|It|contact|In|the|having)[^A-Za-z]
And applying but it is not able to find "Jane"
in the sentence
"Issue with eMail/file Encryption Incorrect email address created for Jane Rose Montalvo."
While Jane is present in the above pattern that I am using.
What could be the reason?
Upvotes: 1
Views: 74
Reputation: 23743
If for some reason you cannot or do not want to modify your pattern and you have overlapping matches that you want to capture, you can use re.search
in a loop - moving the starting point for the search to the character just after the beginning of the previous match.
#recursive
def foo(s, p, start = 0):
m = p.search(s, start)
if not m:
return ''
return m.group() + foo(s, p, m.start() + 1)
#iterative
def foo1(s, p):
result = ''
m = p.search(s, 0)
while m:
result += m.group()
m = p.search(s, m.start() + 1)
return result
print foo(s, re.compile(p))
print foo1(s, re.compile(p))
>>>
eMail/file Encryption Incorrect email address created for Jane Rose Montalvo.
eMail/file Encryption Incorrect email address created for Jane Rose Montalvo.
>>>
Upvotes: 0
Reputation: 174696
Because of overlapping of characters. Just use a capturing group inside lookahead inorder to capture the overlapping characters,
(?=[^A-Za-z](email,|help|BGN|won't|go|corner|issues|disconected|We|group|No|send|Bv|connecting|has|Pittsburgh,|Many|(Akustica,|Toluca|cannot|Restarting|they|not|PI2|one|condition|entire|LAN|experincing|bar|Exchange,|server|Are|PA)|OutLook|right|says|Rose|Montalvo|back|computer|are|Jane|thier|Disconnected|Nrd|and/or|network|for|Appears|e-mail|unable|Connected|then|Broadview,|issue|email|shows|available|be|we|exchange|error|address|based|My|Microsoft|received|working|created|receive|impacted|WIFI|through|connection|including|or|IL|outlook|via|facility|Everyone's|servers|Also|message|"The|your|Status|doesn't|service|SI-MBX82\.de\.bosch\.com,|next|appears|"disconnected"|Encryption|eMail/file|today|"Waiting|"send/receive"|but|it|trying|SAP|disconnected|e-mails|this|getting|can|of|connect|Incorrect|manually|is|site|an|folder"|cant|Other|have|in|Receiving|if|Plant|no|SI-MBX80\.de\.bosch\.com|that|when|online|persists\."|Customer|administrator|users|update|applications|"Disconnected"|SI-MBX81\.de\.bosch.com|The|on|lower|Some|It|contact|In|the|having)[^A-Za-z])
Upvotes: 2
Reputation: 67968
The problem is your regex captures \s
before and after the word and it is also the matching criteria.
Hello Jane
So from this once Hello
is captured Jane
is left and it cannot be matched as it has no space before it.You should make it an assert rather than matching one.
Use (?<=[^a-zA-Z]) instead of simple [^a-zA-Z].See demo.
http://regex101.com/r/lU7jH1/9
Upvotes: 2