Reputation: 2047
Consider a log file using log4j syntax:
2014-02-10 08:44:53,295 ERROR com.comnany.some.class Message
message message message
2014-02-10 08:44:53,995 WARN com.comnany.some.class An irrelevant warn message...
2014-02-10 08:45:00,010 DEBUG com.comnany.some.class An irrelevant debug message...
I need to write a matcher in perl to match all errors in the log file. The match must contain not only the line that has the ERROR in it, but all the lines until (But not including) the start of the next log entry.
Can anyone come up with a regular expression to perform this match (Preferably with an explanation)?
Upvotes: 0
Views: 385
Reputation: 302
Loading the entire file to find multi-line log entries is a pretty bad idea. Consider the size of your log files, which now have to be loaded in their entirety into memory and processed all at once. Perl historically isn't very good at releasing memory...
A more sane approach would be to process the log either in their entirety or from a specific point, setting a flag in the loop that checks each line and adding if it's a new entry.
First note, consider pre-compiling your regexes using the qr() operator. That will save you a few cycles, particularly when you're iterating over multiple lines or otherwise using the same regex multiple times.
One other note regarding my code below, I like to use labels and next() statements, because explicitly nexting to the next iteration of the loop clarifies the flow of the code.
The overall flow would be:
It might look something like this:
my $log_entry_begin_regex = qr/(?P<date>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{1,3}\s+)(FATAL|WARN|ERROR|INFO|DEBUG|TRACE)/;
my $found_error_flag;
open my $file, "<", $path_to_file;
LINE:
while ( my $line = <$file> ) {
# It's a new log entry line
if (($line, $error_level) =~ $log_entry_begin_regex ) {
if ( $error_level eq 'ERROR' ) {
$found_error_flag = 1;
print $line
next LINE;
} else {
$found_error_flag = 0;
next LINE;
}
} elsif ($found_error_flag ) {
print $line;
}
}
Upvotes: 0
Reputation: 43013
Try this regex:
/(?P<date>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{1,3}\s+)ERROR\s+
(?P<class>.+?)\s+
(?P<message>.+?)(?=(?1)|$)/gsx
g: Search globally (don't return on first match).
s: Dot matches newline characters.
x: Spaces in the pattern are ignored.
Upvotes: 2