Reputation: 199
I am using the PCRE to match exception logs with the help of the following regular expression.
Regular Expression
\[([\d -:]+)\]ERROR.*?(F:[^ ]+|F:).*?(?sx).*?(\b[a-zA-Z]*Exception\b)
Exceptions logs sample
Where the caught exception is inline (in one line) of the log statement
[2020-03-07 01:02:37.512]ERROR [L:xx F:yy T:zz R: C: ] xxxxxxx xxxxx xxxx xxxx NullPointerException
at com.package.name(b.java:20)
at com.package.name.someClass.someMethod(P.java:2423)
at com.package.name.someClass.someMethod(P.java:40)
at com.package.name.someClass.someMethod(P.java:4054)
Where the caught exception is in any other next log statement
[2020-03-07 01:02:37.512]ERROR [L:xx F:yy T:zz R: C: ] xxxxxxx xxxxx xxxx xxxxxxxx xxxxxxxxxxxxxxxx
xxxxxxxxx xxxxxxxxxxx xxxxxxxx xxxx xxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxx xxxx NullPointerException
at com.package.name(b.java:20)
at com.package.name.someClass.someMethod(P.java:2423)
at com.package.name.someClass.someMethod(P.java:40)
at com.package.name.someClass.someMethod(P.java:4054)
The second sample is not matched by the mentioned regular expression.
I also tried using the multi-line flag (\m), but it does not stop for the cases it should not match any thing
Example
[2020-03-07 01:02:37.512]ERROR [L:xx F:yy1 T:zz1 R: C: ] xxxxxxx xxxxx xxxx xxxx
[2020-03-07 01:03:37.512]ERROR [L:xx F:yy2 T:zz2 R: C: ] xxxxxxx xxxxx xxxx xxxx
[2020-03-07 01:04:37.512]ERROR [L:xx F:yy3 T:zz3 R: C: ] xxxxxxx xxxxx xxxx xxxx
[2020-03-07 01:05:37.512]ERROR [L:xx F:yy4 T:zz5 R: C: ] NullPointerException
at com.package.name(b.java:20)
at com.package.name.someClass.someMethod(P.java:2423)
at com.package.name.someClass.someMethod(P.java:40)
at com.package.name.someClass.someMethod(P.java:4054)
Expected Result
Group 1: 2020-03-07 01:05:37.512, Group 2: F:yy4, Group 3: NullPointerException
Actual Result
Group 1: 2020-03-07 01:02:37.512 Group 2: F:yy1 Group 3: NullPointerException
See how after matching the first line it doesn't stop until it finds the complete expression.
Can someone please help me out here.
Upvotes: 2
Views: 1713
Reputation: 163287
You could check from the start of the pattern the the next line does not start with for example [
and a digit using a negative lookahead (?!.*\R\[\d)
as you are using the inline modifiers (?sx)
afterwards.
This part (F:[^ ]+|F:)
could be shorted to matching F: and 0+ times a non whitespace char (F:\S*)
In the character class [\d -:]
the hyphen is matching a range instead of only a hyphen char. If you meant to match it literally, you can for example move it to the end and add matching the dot.
^(?!.*\R\[\d)\[([\d :.-]+)]ERROR.*?(F:\S*)(?sx).*?\b([a-zA-Z]*Exception)\b
Explanation
^
Start of string(?!.*\R\[\d)
Negative lookahead, assert the next line does not start with [
and a digit where \R
matches a unicode newline sequence\[
Match [
([\d :.-]+)
Capture group 1, match any of the listed]
Match ]
ERROR.*?
Match ERROR and 0+ times any char except a newline(F:\S*)
Capture group 2, Match F: and 0+ times a non whitespace char(?sx).*?
Inline modifier s
making the dot match a newline and x
to ignore whitespace\b([a-zA-Z]*Exception)\b
Capture group 3, Match 0+ times a char a-zA-Z followed by ExceptionAnother option without using the inline modifier s
to make the dot match a newline could be to optionally matching all the lines that do not contain Exception after matching the the ERROR and F: part.
^(?!.*\R\[\d)\[([\d :.-]+)]ERROR.*?(F:\S*)(?:(?!.*Exception|.*\R\[\d).*\R)*+.*\b([a-zA-Z]*Exception)\b
Upvotes: 2