Reputation: 33
I am trying to capture this multiline log style in python, similar to that of a log parser. Here is a log sample:
> [2019-11-21T00:58:47.922Z] This is a single log line
> [2019-11-21T00:59:02.781Z] This is a multiline log This is a multiline
> log This is a multiline log This is a multiline log
> [2019-11-21T00:58:47.922Z] This is a single log line
> [2019-11-21T00:59:02.781Z] This is a multiline log This is a multiline
> log This is a multiline log This is a multiline log
Unfortunately, the newline characters are messing me up. I've tried negative lookaheads, behinds, etc. I can never capture more than a single log line. When I try to include the newlines, I end up capturing the entire log.
What python regex can I use to capture each message indiviudally?
I've tried stuff like:
regex = re.compile(r"^\[20.*Z\][\s\S]+", re.MULTILINE)
:(
Upvotes: 1
Views: 155
Reputation: 784938
You may use this regex in python with a lookahead:
^\[20[^]]*Z\][\s\S]+?$(?=\n\[|\Z)
RegEx Details:
^
: Start\[20[^]]*Z\]
: Match date-time string wrapped as [...Z]
(?=\n\[|\Z)
: Positive lookahead condition to assert that we have a newline and start of date-time stamp [
at next position or it is end of inputHere is an alternate non-lookeahead solution that is more efficient:
^\[20[^]]*Z\].*(?:\n[^[].*)*
RegEx Details:
^
: Start\[20[^]]*Z\]
: Match date-time string wrapped as [...Z]
.*
: Match rest of line (without line breaks)(?:\n[^[].*)*
: Match remaining part of message that is a line-break followed by a non [
character at startUpvotes: 2
Reputation: 163207
As an alternative you could match the pattern that marks the start of the log using the square brackets and repeat matching all following lines that do not start with an opening square bracket
^\[20[^\]]+Z\].*(?:\r?\n(?!\[).*)*
In parts that will match
^
Start of the string\[20[^\]]+Z\]
Match [20
, then any char except ]
and then Z]
.*
Match any char except a newline 0+ times(?:
Non capturing group
\r?\n(?!\[)
Match a newline and assert that it does not start with [
.*
Match any char except a newline 0+ times )*
Close non capturing group and repleat 0+ timesUpvotes: 2