Aadit
Aadit

Reputation: 199

Regex for exception logs

I am using the PCRE to match exception logs with the help of the following regular expression.

Regular Expression

\[([\d -:]+)\]ERROR.*?(F:[^ ]+|F:).*?(?sx).*?(\b[a-zA-Z]*Exception\b)

Exceptions logs sample

  1. Where the caught exception is inline (in one line) of the log statement

    [2020-03-07 01:02:37.512]ERROR [L:xx F:yy T:zz R: C: ] xxxxxxx xxxxx xxxx xxxx NullPointerException
            at com.package.name(b.java:20)
            at com.package.name.someClass.someMethod(P.java:2423)
            at com.package.name.someClass.someMethod(P.java:40)
            at com.package.name.someClass.someMethod(P.java:4054)
    
  2. Where the caught exception is in any other next log statement

    [2020-03-07 01:02:37.512]ERROR [L:xx F:yy T:zz R: C: ] xxxxxxx xxxxx xxxx xxxxxxxx xxxxxxxxxxxxxxxx 
    xxxxxxxxx xxxxxxxxxxx xxxxxxxx xxxx xxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxx
    xxxxxxxx xxxx NullPointerException
            at com.package.name(b.java:20)
            at com.package.name.someClass.someMethod(P.java:2423)
            at com.package.name.someClass.someMethod(P.java:40)
            at com.package.name.someClass.someMethod(P.java:4054)
    

The second sample is not matched by the mentioned regular expression.

I also tried using the multi-line flag (\m), but it does not stop for the cases it should not match any thing

Example

[2020-03-07 01:02:37.512]ERROR [L:xx F:yy1 T:zz1 R: C: ] xxxxxxx xxxxx xxxx xxxx
[2020-03-07 01:03:37.512]ERROR [L:xx F:yy2 T:zz2 R: C: ] xxxxxxx xxxxx xxxx xxxx
[2020-03-07 01:04:37.512]ERROR [L:xx F:yy3 T:zz3 R: C: ] xxxxxxx xxxxx xxxx xxxx 
[2020-03-07 01:05:37.512]ERROR [L:xx F:yy4 T:zz5 R: C: ] NullPointerException
            at com.package.name(b.java:20)
            at com.package.name.someClass.someMethod(P.java:2423)
            at com.package.name.someClass.someMethod(P.java:40)
            at com.package.name.someClass.someMethod(P.java:4054)

Expected Result

Group 1: 2020-03-07 01:05:37.512, Group 2: F:yy4, Group 3: NullPointerException

Actual Result

Group 1: 2020-03-07 01:02:37.512 Group 2: F:yy1 Group 3: NullPointerException

See how after matching the first line it doesn't stop until it finds the complete expression.

Can someone please help me out here.

Upvotes: 2

Views: 1713

Answers (1)

The fourth bird
The fourth bird

Reputation: 163287

You could check from the start of the pattern the the next line does not start with for example [ and a digit using a negative lookahead (?!.*\R\[\d) as you are using the inline modifiers (?sx) afterwards.

This part (F:[^ ]+|F:) could be shorted to matching F: and 0+ times a non whitespace char (F:\S*)

In the character class [\d -:] the hyphen is matching a range instead of only a hyphen char. If you meant to match it literally, you can for example move it to the end and add matching the dot.

^(?!.*\R\[\d)\[([\d :.-]+)]ERROR.*?(F:\S*)(?sx).*?\b([a-zA-Z]*Exception)\b

Explanation

  • ^ Start of string
  • (?!.*\R\[\d) Negative lookahead, assert the next line does not start with [ and a digit where \R matches a unicode newline sequence
  • \[ Match [
  • ([\d :.-]+) Capture group 1, match any of the listed
  • ] Match ]
  • ERROR.*? Match ERROR and 0+ times any char except a newline
  • (F:\S*) Capture group 2, Match F: and 0+ times a non whitespace char
  • (?sx).*? Inline modifier s making the dot match a newline and x to ignore whitespace
  • \b([a-zA-Z]*Exception)\b Capture group 3, Match 0+ times a char a-zA-Z followed by Exception

Regex demo

Another option without using the inline modifier s to make the dot match a newline could be to optionally matching all the lines that do not contain Exception after matching the the ERROR and F: part.

^(?!.*\R\[\d)\[([\d :.-]+)]ERROR.*?(F:\S*)(?:(?!.*Exception|.*\R\[\d).*\R)*+.*\b([a-zA-Z]*Exception)\b

Regex demo

Upvotes: 2

Related Questions