Martin Nielsen
Martin Nielsen

Reputation: 2047

Matching single log entries with perl regex

Consider a log file using log4j syntax:

2014-02-10 08:44:53,295 ERROR com.comnany.some.class Message
message message message
2014-02-10 08:44:53,995 WARN com.comnany.some.class An irrelevant warn message...
2014-02-10 08:45:00,010 DEBUG com.comnany.some.class An irrelevant debug message...

I need to write a matcher in perl to match all errors in the log file. The match must contain not only the line that has the ERROR in it, but all the lines until (But not including) the start of the next log entry.

Can anyone come up with a regular expression to perform this match (Preferably with an explanation)?

Upvotes: 0

Views: 385

Answers (2)

BergBrains
BergBrains

Reputation: 302

Loading the entire file to find multi-line log entries is a pretty bad idea. Consider the size of your log files, which now have to be loaded in their entirety into memory and processed all at once. Perl historically isn't very good at releasing memory...

A more sane approach would be to process the log either in their entirety or from a specific point, setting a flag in the loop that checks each line and adding if it's a new entry.

First note, consider pre-compiling your regexes using the qr() operator. That will save you a few cycles, particularly when you're iterating over multiple lines or otherwise using the same regex multiple times.

One other note regarding my code below, I like to use labels and next() statements, because explicitly nexting to the next iteration of the loop clarifies the flow of the code.

The overall flow would be:

  1. Identify lines that are start of log entries;
  2. If it's an error, set a flag so you know to append any lines up to a new log entry start line;
  3. Append lines to the current error message when the error log entry flag is set
  4. Print the current log entry

It might look something like this:

my $log_entry_begin_regex = qr/(?P<date>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{1,3}\s+)(FATAL|WARN|ERROR|INFO|DEBUG|TRACE)/;

my $found_error_flag;
open my $file, "<", $path_to_file;
LINE:
while ( my $line = <$file> ) {
    # It's a new log entry line
    if (($line, $error_level) =~ $log_entry_begin_regex ) {

        if ( $error_level eq 'ERROR' ) {
            $found_error_flag = 1;
            print $line
            next LINE;
        } else {
            $found_error_flag = 0;
            next LINE;
        }
    } elsif ($found_error_flag ) {
        print $line;
    }
}

Upvotes: 0

Stephan
Stephan

Reputation: 43013

Try this regex:

/(?P<date>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{1,3}\s+)ERROR\s+
 (?P<class>.+?)\s+
 (?P<message>.+?)(?=(?1)|$)/gsx

Modifiers

g: Search globally (don't return on first match).
s: Dot matches newline characters.
x: Spaces in the pattern are ignored.

Demo

http://regex101.com/r/rD8dI7

References

Upvotes: 2

Related Questions