Hemanth
Hemanth

Reputation: 203

logs regex matching issue

reLogExtractor = re.compile(# parse over "date mumble process[pid]: [mumble." (PID is optional)
        r'.*?\s[\w\-\.]*?(\[\d*\])?:\s*\[[\d*\]]*\]*'
                       # any of the following
        r'(?:'
        r'(?P<logError>EMERG|EMERGENCY|ERR|ERROR|CRIT|CRITICAL|ALERT)|'
        r'(?P<logWarning>WARN|WARNING)|'
                       r'(?P<logNotice>NOTICE)|'
                       r'(?P<logNormal>[^\]]*)'
                       # close'm, parse over the "]"
                       r')\]')

using this regex I am trying match below sentences log1,log2,log3 but log1 is matching other two giving none.

log1="Jun 23 08:29:13 blr-00 rscored[0000]: [ocd.auth_helper.WARNING] Error in message from 24214211.lab.heewt.com for server failed due to HTTPSConnectionPool(host='24214211.lab.htretr.com', port=443): Max retries exceeded with url: /api/cmc.auth/1.0/certificate (Caused by <class 'socket.error'>: [Errno 110] Connection timed out)"
log2="Jun 7 12:42:02 brr-00 interceptor [0000]: [cluster/reader/[fdos:d7e2:2d90:1904::8]:7850-> [fd08:d7e2:2d90:1902::3]:15101.ERR] - {- -} Error reading header from [fd08:d7e2:2d90:1904::8]:7850: Connection reset by peer"
log3="Jun 3 13:01:58 blr-00 interceptor [0000]: [cluster/reader/[fdos:d7e2:2d90:1904::5]: 12264-> (fd08:d7e2:2d90: 1902: : 3]: 7850. WARN] - {- -} No heartbeat from channel [fd08:d7e2:2d90:1902:: 3]:7850 <=> [fd08:d7e2:2d90: 1904::5]:12264. Closing channel."

is there any fix do I have to do in reLogExtractor to match all three?

Upvotes: 2

Views: 77

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

I suggest matching the rightmost status messages, or the substring between square brackets if there are none:

^[^][]*(\[\d*\])?:.*\b(?:(?P<logError>EMERG|EMERGENCY|ERR(?:OR)?|CRIT(?:TICAL)?|ALERT)|(?P<logWarning>WARN(?:ING)?)|(?P<logNotice>NOTICE)|\[(?P<logNormal>[^]]*))]

See the regex demo. In your code:

reLogExtractor = re.compile(# parse over "date mumble process[pid]: [mumble." (PID is optional)
            r'^[^][]*(\[\d*\])?:.*\b'
                           # any of the following
            r'(?:'
r'(?P<logError>EMERG|EMERGENCY|ERR(?:OR)?|CRIT(?:TICAL)?|ALERT)|'
        r'(?P<logWarning>WARN(?:ING)?)|'
                       r'(?P<logNotice>NOTICE)|'
                       r'\[(?P<logNormal>[^\]]*)'
                       # close'm, parse over the "]"
                       r')\]')

See the Python demo. Details:

  • ^ - start of string
  • [^][]* - zero or more chars other than [ and ]
  • (\[\d*\])? - an optional Group 1: a [, zero or more digits, ]
  • : - a : char
  • .* - any zero or more chars other than line break chars as many as possible
  • \b - a word boundary
  • (?:(?P<logError>EMERG|EMERGENCY|ERR(?:OR)?|CRIT(?:TICAL)?|ALERT)|(?P<logWarning>WARN(?:ING)?)|(?P<logNotice>NOTICE)|\[(?P<logNormal>[^]]*)) - either of
    • (?P<logError>EMERG|EMERGENCY|ERR(?:OR)?|CRIT(?:TICAL)?|ALERT)| - Group "logError": EMERGENCY, EMERG, ERR, ERROR, CRIT, CRITTICAL or ALERT or
    • (?P<logWarning>WARN(?:ING)?)| - Group "logWarning": WARN, WARNING, or
    • (?P<logNotice>NOTICE)| - Group "logNotice": NOTICE, or
    • \[(?P<logNormal>[^]]*) - [, Group "logNormal": any zero or more chars other than ]
  • ] - a ] char.

Upvotes: 2

Related Questions