Shivendra
Shivendra

Reputation: 1096

RegEx pattern not behaving as wanted

I am using regex pattern

[^A-Za-z](email,|help|BGN|won't|go|corner|issues|disconected|We|group|No|send|Bv|connecting|has|Pittsburgh,|Many|(Akustica,|Toluca|cannot|Restarting|they|not|PI2|one|condition|entire|LAN|experincing|bar|Exchange,|server|Are|PA)|OutLook|right|says|Rose|Montalvo|back|computer|are|Jane|thier|Disconnected|Nrd|and/or|network|for|Appears|e-mail|unable|Connected|then|Broadview,|issue|email|shows|available|be|we|exchange|error|address|based|My|Microsoft|received|working|created|receive|impacted|WIFI|through|connection|including|or|IL|outlook|via|facility|Everyone's|servers|Also|message|"The|your|Status|doesn't|service|SI-MBX82.de.bosch.com,|next|appears|"disconnected"|Encryption|eMail/file|today|"Waiting|"send/receive"|but|it|trying|SAP|disconnected|e-mails|this|getting|can|of|connect|Incorrect|manually|is|site|an|folder"|cant|Other|have|in|Receiving|if|Plant|no|SI-MBX80.de.bosch.com|that|when|online|persists."|Customer|administrator|users|update|applications|"Disconnected"|SI-MBX81.de.bosch.com|The|on|lower|Some|It|contact|In|the|having)[^A-Za-z]

And applying but it is not able to find "Jane" in the sentence

 "Issue with eMail/file Encryption Incorrect email address created for Jane Rose Montalvo."

While Jane is present in the above pattern that I am using.

What could be the reason?

Upvotes: 1

Views: 74

Answers (3)

wwii
wwii

Reputation: 23743

If for some reason you cannot or do not want to modify your pattern and you have overlapping matches that you want to capture, you can use re.search in a loop - moving the starting point for the search to the character just after the beginning of the previous match.

#recursive
def foo(s, p, start = 0):
    m = p.search(s, start)
    if not m:
        return ''
    return m.group() + foo(s, p, m.start() + 1)

#iterative
def foo1(s, p):
    result = ''
    m = p.search(s, 0)
    while m:
        result += m.group()
        m = p.search(s, m.start() + 1)
    return result

print foo(s, re.compile(p))
print foo1(s, re.compile(p))

>>> 

 eMail/file  Encryption  Incorrect  email  address  created  for  Jane  Rose  Montalvo.
 eMail/file  Encryption  Incorrect  email  address  created  for  Jane  Rose  Montalvo.
>>> 

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174696

Because of overlapping of characters. Just use a capturing group inside lookahead inorder to capture the overlapping characters,

(?=[^A-Za-z](email,|help|BGN|won't|go|corner|issues|disconected|We|group|No|send|Bv|connecting|has|Pittsburgh,|Many|(Akustica,|Toluca|cannot|Restarting|they|not|PI2|one|condition|entire|LAN|experincing|bar|Exchange,|server|Are|PA)|OutLook|right|says|Rose|Montalvo|back|computer|are|Jane|thier|Disconnected|Nrd|and/or|network|for|Appears|e-mail|unable|Connected|then|Broadview,|issue|email|shows|available|be|we|exchange|error|address|based|My|Microsoft|received|working|created|receive|impacted|WIFI|through|connection|including|or|IL|outlook|via|facility|Everyone's|servers|Also|message|"The|your|Status|doesn't|service|SI-MBX82\.de\.bosch\.com,|next|appears|"disconnected"|Encryption|eMail/file|today|"Waiting|"send/receive"|but|it|trying|SAP|disconnected|e-mails|this|getting|can|of|connect|Incorrect|manually|is|site|an|folder"|cant|Other|have|in|Receiving|if|Plant|no|SI-MBX80\.de\.bosch\.com|that|when|online|persists\."|Customer|administrator|users|update|applications|"Disconnected"|SI-MBX81\.de\.bosch.com|The|on|lower|Some|It|contact|In|the|having)[^A-Za-z])

DEMO

Upvotes: 2

vks
vks

Reputation: 67968

The problem is your regex captures \s before and after the word and it is also the matching criteria.

Hello Jane

So from this once Hello is captured Jane is left and it cannot be matched as it has no space before it.You should make it an assert rather than matching one.

Use (?<=[^a-zA-Z]) instead of simple [^a-zA-Z].See demo.

http://regex101.com/r/lU7jH1/9

Upvotes: 2

Related Questions