Reputation: 477
I am trying to write a regex for logs which seems to be working fine for log entries but in some log entries there are carriage returns which then fails to pick up the next line
([0-9]{2}\s[A-Za-z]{3}\s[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}(?:,[0-9]{3})?)\s?(.*)
above regex works fine for lines with no extra carriage return
01 Jan 2018 04:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal
02 Jan 2018 05:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal
but this fails to pick up extra line 1
and extra line 2
when on of the lines have added carriage return
01 Jan 2018 04:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal
02 Jan 2018 05:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal
extra line 1
extra line 2
03 Jan 2018 08:25:56,546 [TEXT] aabb33-ddee33-54321 (host-1-usa-east) this.is.sample.log: service is responding normal
I even tried to add ^ to match start but that only picks the first log entry
^([0-9]{2}\s[A-Za-z]{3}\s[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}(?:,[0-9]{3})?)\s?(.*)
Upvotes: 0
Views: 1009
Reputation: 370989
You might use
(?<=\n|^)(\d{2} [A-Za-z]{3} \d{4} \d{2}:\d{2}:\d{2}(?:,\d{3})?)\s?(.*?)(?=$|\n\d{2} [A-Za-z]{3} \d{4})
^^^^^^^^^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The important part is the lookahead at the end for a date or the end of the string. Also make sure to lazy-repeat the .
. The beginning also has lookbehind for a \n
or ^
instead of the m flag so that the lookahead at the end for $
will only match the end of the string, not just the end of a line.
https://regex101.com/r/YAkWBe/1
Also remember that you can simplify [0-9]
to \d
.
If you can't use the s
flag (allows the dot to match a newline), then instead of repeating the dot to capture the (possibly multiline) string after the date, use [\s\S]
, which will capture everything (all non-whitespace characters, and all whitespace characters -> everything):
([\s\S]*?)
Upvotes: 1
Reputation: 521997
I can offer the following regex which works fine, except that it doesn't capture the very last log entry in your file:
([0-9]{2}\s[A-Za-z]{3}\s[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}(?:,[0-9]{3})?)\s?(.*?)(?=[0-9]{2}\s[A-Za-z]{3}\s[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}(?:,[0-9]{3}))
The long story short is that I added a lookahead to the end of your pattern, after the (.*)
, which pauses when it encounters the start of the next log entry. Then, the only other change is to use (.*?)
, i.e. make the dot lazy so that it will pause at the lookahead.
Also, this regex should be run in dot all mode, where .*
would match across lines. If you don't have dot all mode explicitly available, you may be able to use [\s\S]*
as an alternative.
Upvotes: 0